Patent classifications
G06F40/216
Method and apparatus for detecting anomalies in mission critical environments
A system and method for detecting anomalies in mission-critical environments. The method includes: parsing at least one received data set into a text structure; isolating a protocol language of the at least one received data set, wherein the protocol language is a standardized pattern for communication over at least one protocol; generating at least one document from the contents of the received at least one data set, wherein the at least one document includes at least one parsed text structure referencing a unique identifier, and wherein the at least one parsed text structure is organized within the at least one document according to a natural language scheme; detecting insights in the generated documents; extracting rules from the detected insights; and detecting anomalies by applying the extracted rules.
Systems and methods for mutual learning for topic discovery and word embedding
Described herein are embodiments for systems and methods for mutual machine learning with global topic discovery and local word embedding. Both topic modeling and word embedding map documents onto a low-dimensional space, with the former clustering words into a global topic space and the latter mapping word into a local continuous embedding space. Embodiments of Topic Modeling and Sparse Autoencoder (TMSA) framework unify these two complementary patterns by constructing a mutual learning mechanism between word co-occurrence based topic modeling and autoencoder. In embodiments, word topics generated with topic modeling are passed into auto-encoder to impose topic sparsity for the autoencoder to learn topic-relevant word representations. In return, word embedding learned by autoencoder is sent back to topic modeling to improve the quality of topic generations. Performance evaluation on various datasets demonstrates the effectiveness of the disclosed TMSA framework in discovering topics and embedding words.
Synthetic training examples from advice for training autonomous agents
Techniques are disclosed for training a machine learning model to perform actions within an environment. In one example, an input device receives a declarative statement. A computation engine selects, based on the declarative statement, a template that includes a template action performable within the environment. The computation engine generates, based on the template, synthetic training episodes. The computation engine further generates experiential training episodes, each experiential training episode collected by a machine learning model from past actions performed by the machine learning model. Each synthetic training episode and experiential training episode comprises an action and a reward. A machine learning system trains, with the synthetic training episodes and the experiential training episodes, the machine learning model to perform the actions within the environment.
Systems and methods for identification and management of compliance-related information associated with enterprise it networks
Various examples are provided related to identification of protected information elements associated with unique entities in data files present in data file collections associated with enterprise IT networks. The unique entities can be associated with one or more entity identifications in one or more data files. Computer-generated identification of entity identifications and protected information elements can be conducted, in part, by at least some human review. Information generated accordingly to the disclosed methodology can be used to generate plans for a time and number of human reviewers needed to review data files. Information generated from the processes herein can be configured as user notifications, reports, dashboards, machine learning for subsequent data file analyses, and notifications of unique entities having protected information elements present in one or more data files.
Automated determination of data values for form fields
Some embodiments provide a non-transitory machine-readable medium that stores a program. The program receives from a client device an image and a user identifier associated with a user of the client device. Based on the user identifier, the program further retrieves a set of historical transactions associated with the user, the program also uses a first machine learning model to predict a string based on the image. The program further uses a second machine learning model to predict a set of digits based on the image. Based on the set of historical transaction data, the string, and the set of digits, the program also determines a data value for a field in a form.
Semi-Supervised Anomaly Detection Under Distribution Mismatch
Aspects of the disclosure are directed to a Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) framework that is not limited by the assumption that labeled and unlabeled data come from the same distribution. SPADE utilizes an ensemble of one-class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching automatically selects critical hyper-parameters for pseudo-labeling without validation data, which is crucial with a limited amount of labeled data.
Semi-Supervised Anomaly Detection Under Distribution Mismatch
Aspects of the disclosure are directed to a Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) framework that is not limited by the assumption that labeled and unlabeled data come from the same distribution. SPADE utilizes an ensemble of one-class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching automatically selects critical hyper-parameters for pseudo-labeling without validation data, which is crucial with a limited amount of labeled data.
ABSTRACT LEARNING METHOD, ABSTRACT LEARNING APPARATUS AND PROGRAM
The efficiency of summary learning that requires an additional input parameter is improved by causing a computer to execute: a first learning step of learning a first model for calculating an importance value of each component in source text, with use of a first training data group and a second training data group, the first training data group including source text, a query related to a summary of the source text, and summary data related to the query in the source text, and the second training group including source text and summary data generated based on the source text; and a second learning step of learning a second model for generating summary data from source text of training data, with use of each piece of training data in the second training data group and a plurality of components extracted for each piece of training data in the second training data group based on importance values calculated by the first model for components of the source text of the piece of training data.
SYSTEM AND METHOD FOR STATISTICAL SUBJECT IDENTIFICATION FROM INPUT DATA
Embodiments provide a system and method for statistical subject identification. The system takes texts, videos, audios, and images as input for which subject needs to be identified. The system pre-process input data and generates n-grams and pre-processed text strings by removing stopwords, punctuations, selective POS tags and lemmatization. Frequency distribution of n-grams are computed, and weightage of n-grams is assigned. For each n-gram, sum of weights across all text strings is computed and a maximum weightage is identified. The computed value as a result of taking a ratio of two, is assigned to each of the n-grams. Values computed for the n-grams have a non-normal distribution, when observed statistically. Thus, the n-gram values are transformed to confidence value following a normal distribution. The system maps the n-gram domains using a domain lexicon. Finally, these domains are aggregated and converged for subject identification based on a pre-annotated mapping dictionary.
SYSTEM AND METHOD FOR STATISTICAL SUBJECT IDENTIFICATION FROM INPUT DATA
Embodiments provide a system and method for statistical subject identification. The system takes texts, videos, audios, and images as input for which subject needs to be identified. The system pre-process input data and generates n-grams and pre-processed text strings by removing stopwords, punctuations, selective POS tags and lemmatization. Frequency distribution of n-grams are computed, and weightage of n-grams is assigned. For each n-gram, sum of weights across all text strings is computed and a maximum weightage is identified. The computed value as a result of taking a ratio of two, is assigned to each of the n-grams. Values computed for the n-grams have a non-normal distribution, when observed statistically. Thus, the n-gram values are transformed to confidence value following a normal distribution. The system maps the n-gram domains using a domain lexicon. Finally, these domains are aggregated and converged for subject identification based on a pre-annotated mapping dictionary.