Patent classifications
G06F18/24765
Distributed predictive analytics data set
A novel distributed method for machine learning is described, where the algorithm operates on a plurality of data silos, such that the privacy of the data in each silo is maintained. In some embodiments, the attributes of the data and the features themselves are kept private within the data silos. The method includes a distributed learning algorithm whereby a plurality of data spaces are co-populated with artificial, evenly distributed data, and then the data spaces are carved into smaller portions whereupon the number of real and artificial data points are compared. Through an iterative process, clusters having less than evenly distributed real data are discarded. A plurality of final quality control measurements are used to merge clusters that are too similar to be meaningful. These distributed quality control measures are then combined from each of the data silos to derive an overall quality control metric.
METHOD AND SYSTEM FOR IDENTIFYING PERSONALLY IDENTIFIABLE INFORMATION (PII) THROUGH SECRET PATTERNS
This disclosure relates to method and system for identifying Personally Identifiable Information (PII) through secret patterns. The method includes receiving user data from at least one data source through a plurality of communication channels. The user data includes PII and non-PII. The user data is associated with a user. The PII includes a plurality of personal identifiers. The method further includes identifying the PII in user data through a predictive model. The method further includes generating a secret pattern based on the PII identified through the predictive model. The secret pattern is an identifiable label. The method further includes adding the secret pattern to each of the plurality of personal identifiers in PII. The method further includes identifying each of the plurality of personal identifiers through the secret pattern in real-time, when user data is transmitted from the at least one data source to at least one data destination.
Data catalog automatic generation system and data catalog automatic generation method
A technology is disclosed that makes it possible even for an analyst, who has poor knowledge relating to field data, to select and use analysis data in analysis. A data catalog automatic generation system that generates a catalog tag to be used to select analysis data from collected field data is configured such that, based on a set classification rule input, a relationship between an objective variable as an analysis perspective relating to field data and an explanatory variable or a causal relationship between a plurality of the explanatory variables is extracted, and based on a result of the extraction, a catalog tag of the objective variable and a catalog tag of the explanatory function are specified and attached.
MACHINE LEARNING BASED MODELS FOR OBJECT RECOGNITION
Machine learning based models recognize objects in images. Specific features of the object are extracted from the image using machine learning based models. The specific features extracted from the image assist deep learning based models in identifying subtypes of a type of object. The system recognizes the objects and collections of objects and determines whether the arrangement of objects violates any predetermined policies. For example, a policy may specify relative positions of different types of objects, height above ground at which certain types of objects are placed, or an expected number of certain types of objects in a collection.
FACTORIZED NEURAL NETWORK
Aspects of the present disclosure relate to factorized neural network techniques. In examples, a layer of a machine learning model is factorized and initialized using spectral initialization. For example, an initial layer parameterized using an initial matrix is processed such that it is instead parameterized by the product of two or more matrices, thereby resulting in a factorized machine learning model. An optimizer associated with the machine learning model may also be processed to adapt a regularizer accordingly. For example, a regularizer using a weight decay function may be adapted to instead use a Frobenius decay function with respect to the factorized model layer. The factorized machine learning model may be trained using the processed optimizer and subsequently used to generate inferences.
Apparatus and method for image processing for machine learning
An image processing apparatus includes a superpixel extractor configured to extract a plurality of superpixels from an input original image, a backbone network including N feature extracting layers (here, N is a natural number of two or more) which divide the input original image into grids including a plurality of regions and generate an output value including a feature value for each of the divided regions, and a superpixel pooling layer configured to generate a superpixel feature value corresponding to each of the plurality of superpixels using a first output value to an N.sup.th output value output from each of the N feature extracting layers.
ESTIMATING MATERIALIZED VIEW REFRESH DURATION
Techniques for a database management system to estimate a time needed to refresh a materialized view. This is a followed by an approach that uses estimated refresh duration to determine an optimized schedule for refreshing the materialized view. The approach combines the refresh duration estimate with a query rewrite pattern prediction for the materialized view and a quiet period prediction for the materialized view to determine the optimized refresh schedule for the materialized view.
PREDICTING FUTURE QUIET PERIODS FOR MATERIALIZED VIEWS
Techniques for a database management system to predict when in the future a materialized view will have a quiet period during which the materialized view will not be stale. This is a followed by an approach that uses the quiet period prediction to determine an optimized schedule for refreshing the materialized view. The approach combines the quiet period prediction with a query rewrite pattern prediction for the materialized view and an estimated refresh duration for the materialized view to determine the optimized refresh schedule for the materialized view.
PREDICTING FUTURE QUERY REWRITE PATTERNS FOR MATERIALIZED VIEWS
Techniques for a database management system to predict when in the future a materialized view will be used for query rewrite. This is a followed by an approach that uses the quiet rewrite pattern prediction to determine an optimized schedule for refreshing the materialized view. The approach combines the query rewrite pattern prediction with a quiet period prediction for the materialized view and an estimated refresh duration for the materialized view to determine the optimized refresh schedule for the materialized view.
Word embedding for non-mutually exclusive categorical data
A machine learning model, including: a categorical input feature, having a defined set of values; a plurality of non-categorical input features; a word embedding layer configured to convert the categorical input feature into an output in a word space having two dimensions; and a machine learning network configured to receive the output of the word embedding layer and the plurality of non-categorical input features and to produce a machine learning model output.