Patent classifications
G06F18/2148
Multi-fidelity simulated data for machine learning
A method of training a machine learning system. The method comprises collecting a first simulation dataset derived from a computer simulating a hypothetical scenario with a first simulation configuration having a first degree of fidelity. The method further comprises collecting a second simulation dataset derived from a computer simulating the hypothetical scenario with a second simulation configuration having a second degree of fidelity different than the first degree of fidelity. The method further comprises building a multi-fidelity training dataset including training data from both the first simulation dataset and the second simulation dataset according to an interleaving protocol.
IDENTIFYING OVERFILLED CONTAINERS
Among other things, the techniques described herein include a method for receiving a plurality of images of one or more containers while the one or more containers are being emptied, the plurality of images comprising a training set of images and a validation set of images; labeling each image of the plurality of images as including either an overfilled container or a not-overfilled container; processing each image of the plurality of images to reduce bias of a machine learning model; training, and based on the labeling, the machine learning model using the plurality of images; and optimizing the machine learning model by performing learning against the validation set, the optimized machine learning model being used to generate a prediction for a new image of a container, the prediction indicating whether the container in the new image was overfilled prior to the new container being emptied.
Deep learning-based variant classifier
The technology disclosed directly operates on sequencing data and derives its own feature filters. It processes a plurality of aligned reads that span a target base position. It combines elegant encoding of the reads with a lightweight analysis to produce good recall and precision using lightweight hardware. For instance, one million training examples of target base variant sites with 50 to 100 reads each can be trained on a single GPU card in less than 10 hours with good recall and precision. A single GPU card is desirable because it a computer with a single GPU is inexpensive, almost universally within reach for users looking at genetic data. It is readily available on could-based platforms.
Uplift modeling
A method includes training a plurality of different types of machine learning models using a training dataset to produce a set of trained machine learning models and determining a lift of each trained machine learning model in the set of trained machine learning models using a validation dataset. The method also includes selecting a trained machine learning model from the set of trained machine learning models that has a highest lift of the set of trained machine learning models and predicting a likelihood that a person will perform an action by applying the selected trained machine learning model to data about the person.
SYSTEM AND METHOD FOR ASSESSING A CANCER STATUS OF BIOLOGICAL TISSUE
A method for assessing a cancer status of biological tissue includes the steps of: obtaining a Raman spectrum indicating a Raman spectroscopy response of the biological tissue, the Raman spectrum captured using a fiber-optic probe of a fiber-optic Raman spectroscopy system; inputting the Raman spectrum into a boosted tree classification algorithm of a computer program, and using the boosted tree classification algorithm for comparing, in real-time, the captured Raman spectrum to reference data and assessing the cancer status of the biological tissue based on said comparison, the reference data being previously determined based on a set of reference Raman spectra indicating Raman spectroscopy responses of reference biological tissues wherein each of the reference biological tissues is associated with a known cancer status; and generating a real-time output indicating the assessed cancer status of the biological tissue,
System and methods for mammalian transfer learning
A neural network is trained using transfer learning to analyze medical image data, including 2D, 3D, and 4D images and models. Where the target medical image data is associated with a species or problem class for which there is not sufficient labeled data available for training, the system may create enhanced training datasets by selecting labeled data from other species, and/or labeled data from different problem classes. During training and analysis, image data is chunked into portions that are small enough to obfuscate the species source, while being large enough to preserve meaningful context related to the problem class (e.g., the image portion is small enough that it can't be determined whether it is from a human or canine, but abnormal liver tissues are still identifiable). A trained checkpoint may then be used to provide automated analysis and heat mapping of input images via a cloud platform or other application.
Semantic image segmentation using gated dense pyramid blocks
An example apparatus for semantic image segmentation includes a receiver to receive an image to be segmented. The apparatus also includes a gated dense pyramid network including a plurality of gated dense pyramid (GDP) blocks to be trained to generate semantic labels for respective pixels in the received image. The apparatus further includes a generator to generate a segmented image based on the generated semantic labels.
Data model generation using generative adversarial networks
Methods for generating data models using a generative adversarial network can begin by receiving a data model generation request by a model optimizer from an interface. The model optimizer can provision computing resources with a data model. As a further step, a synthetic dataset for training the data model can be generated using a generative network of a generative adversarial network, the generative network trained to generate output data differing at least a predetermined amount from a reference dataset according to a similarity metric. The computing resources can train the data model using the synthetic dataset. The model optimizer can evaluate performance criteria of the data model and, based on the evaluation of the performance criteria of the data model, store the data model and metadata of the data model in a model storage. The data model can then be used to process production data.
SYSTEMS AND METHODS FOR COLLABORATIVE FILTERING WITH VARIATIONAL AUTOENCODERS
Collaborative filtering systems based on variational autoencoders (VAEs) are provided. VAEs may be trained on row-wise data without necessarily training a paired VAE on column-wise data (or vice-versa), and may optionally be trained via minibatches. The row-wise VAE models the output of the corresponding column-based VAE as a set of parameters and uses these parameters in decoding. In some implementations, a paired VAE is provided which receives column-wise data and models row-wise parameters; each of the paired VAEs may bind their learned column- or row-wise parameters to the output of the corresponding VAE. The paired VAEs may optionally be trained via minibatches. Unobserved data may be explicitly modelled. Methods for performing inference with such VAE-based collaborative filtering systems are also disclosed, as are example applications to search and anomaly detection.
Model Management System for Developing Machine Learning Models
Provided is a system for developing a geographic agnostic machine learning model. The system may select transaction data associated with payment transactions conducted by a first plurality of users, wherein the transaction data includes first transaction data associated with payment transactions conducted by a first plurality of users in a first geographic area and second transaction data associated with payment transactions conducted by a second plurality of users in a second geographic area, normalize the first transaction data associated with payment transactions conducted by the first plurality of users in the first geographic area and the second transaction data associated with payment transactions conducted by the second plurality of users in the second geographic area to provide training data, generate a machine learning model using the training data, and determine a classification of an input using the machine learning model. A method and computer program product are also disclosed.