Patent classifications
G06V10/7753
TRAINING IN NEURAL NETWORKS
A system, obtaining a first training dataset, comprising a plurality of first image and pose data pairs; obtaining a first generated dataset, comprising a plurality of first image and estimated pose data pairs, wherein estimated pose data of the first image and estimated pose data pairs are generated by a first neural network trained using the first training dataset; obtaining a second generated dataset, comprising a plurality of second image and estimated pose data pairs, wherein estimated pose data of the second image and estimated pose data pairs are generated by a second neural network trained using the first training dataset; generating the first and second generated datasets a generated training dataset, comprising image and estimated pose data pairs selected from said first generated dataset; and training a third neural network based on a combination of some or all of the first training dataset and the generated training dataset.
Depth data model training with upsampling, losses, and loss balancing
Techniques for training a machine learned (ML) model to determine depth data based on image data are discussed herein. Training can use stereo image data and depth data (e.g., lidar data). A first (e.g., left) image can be input to a ML model, which can output predicted disparity and/or depth data. The predicted disparity data can be used with second image data (e.g., a right image) to reconstruct the first image. Differences between the first and reconstructed images can be used to determine a loss. Losses may include pixel, smoothing, structural similarity, and/or consistency losses. Further, differences between the depth data and the predicted depth data and/or differences between the predicted disparity data and the predicted depth data can be determined, and the ML model can be trained based on the various losses. Thus, the techniques can use self-supervised training and supervised training to train a ML model.
SYSTEMS AND METHODS FOR CONTRASTIVE LEARNING OF VISUAL REPRESENTATIONS
Systems, methods, and computer program products for performing semi-supervised contrastive learning of visual representations are provided. For example, the present disclosure provides systems and methods that leverage particular data augmentation schemes and a learnable nonlinear transformation between the representation and the contrastive loss to provide improved visual representations. Further, the present disclosure also provides improvements for semi-supervised contrastive learning. For example, computer-implemented method may include performing semi-supervised contrastive learning based on a set of one or more unlabeled training data, generating an image classification model based on a portion of a plurality of layers in a projection head neural network used in performing the contrastive learning, performing fine-tuning of the image classification model based on a set of one or more labeled training data, and after performing the fine-tuning, distilling the image classification model to a student model comprising a relatively smaller number of parameters than the image classification model.
AUTOMATICALLY SELECTING QUERY OBJECTS IN DIGITAL IMAGES
The present disclosure relates to an object selection system that automatically detects and selects objects in a digital image utilizing a large-scale object detector. For instance, in response to receiving a request to automatically select a query object with an unknown object class in a digital image, the object selection system can utilize a large-scale object detector to detect potential objects in the image, filter out one or more potential objects, and label the remaining potential objects in the image to detect the query object. In some implementations, the large-scale object detector utilizes a region proposal model, a concept mask model, and an auto tagging model to automatically detect objects in the digital image.
User interface configured to facilitate user annotation for instance segmentation within biological samples
Novel tools and techniques are provided for implementing digital microscopy imaging using deep learning-based segmentation via multiple regression layers, implementing instance segmentation based on partial annotations, and/or implementing user interface configured to facilitate user annotation for instance segmentation. In various embodiments, a computing system might generate a user interface configured to collect training data for predicting instance segmentation within biological samples, and might display, within a display portion of the user interface, the first image comprising a field of view of a biological sample. The computing system might receive, from a user via the user interface, first user input indicating a centroid for each of a first plurality of objects of interest and second user input indicating a border around each of the first plurality of objects of interest. The computing system might train an AI system to predict instance segmentation of objects of interest in images of biological samples.
SELECTING UNLABELED DATA OBJECTS TO BE PROCESSED
Systems and methods for selecting at least one unlabeled data object from a set of unlabeled data objects. The present invention receives a set of unlabeled data objects and identifies at least one data object in the set that is considered to differ from the others. The at least one data object is selected for further processing, which may include labeling processes. In some embodiments, the data objects are passed through at least one representation-generating module, and the resulting representations are compared to each other. Differences between the representations are evaluated against at least one criterion. If the differences meet the at least one criterion, corresponding data objects are considered to differ from the others and are then selected for further processing. In some implementations, a sample set of sample data objects may be used. In some implementations, the at least one representation-generating module may comprise a neural network.
METHOD AND APPARATUS FOR GENERATING TARGET RE-RECOGNITION MODEL AND RE-RECOGNIZING TARGET
A method, an apparatus, device and a storage medium for generating a target re-recognition model are provided. The method may include: acquiring a set of labeled samples, a set of unlabeled samples and an initialization model obtained through supervised training; performing feature extraction on each sample in the set of the unlabeled samples by using the initialization model; clustering features extracted from the set of the unlabeled samples by using a clustering algorithm; assigning, for each sample in the set of the unlabeled samples, a pseudo label to the sample according to a cluster corresponding to the sample in a feature space; and mixing a set of samples with a pseudo label and the set of the labeled samples as a set of training samples, and performing supervised training on the initialization model to obtain a target re-recognition model.
DOMAIN ADAPTATION USING POST-PROCESSING MODEL CORRECTION
Techniques are described for domain adaptation of image processing models using post-processing model correction According to an embodiment, a method comprises training, by a system operatively coupled to a processor, a post-processing model to correct an image-based inference output of a source image processing model that results from application of the source image processing model to a target image from a target domain that differs from a source domain, wherein the source image processing model was trained on source images from the source domain. In one or more implementations, the source imaging processing model comprises an organ segmentation model and the post-processing model can comprise a shape-autoencoder. The method further comprises applying, by the system, the source image processing model and the post-processing model to target images from the target domain to generate optimized image-based inference outputs for the target images.
Video action segmentation by mixed temporal domain adaption
Embodiments herein treat the action segmentation as a domain adaption (DA) problem and reduce the domain discrepancy by performing unsupervised DA with auxiliary unlabeled videos. In one or more embodiments, to reduce domain discrepancy for both the spatial and temporal directions, embodiments of a Mixed Temporal Domain Adaptation (MTDA) approach are presented to jointly align frame-level and video-level embedded feature spaces across domains, and, in one or more embodiments, further integrate with a domain attention mechanism to focus on aligning the frame-level features with higher domain discrepancy, leading to more effective domain adaptation. Comprehensive experiment results validate that embodiments outperform previous state-of-the-art methods. Embodiments can adapt models effectively by using auxiliary unlabeled videos, leading to further applications of large-scale problems, such as video surveillance and human activity analysis.
Data augmentation for image classification tasks
A computer-implemented method and systems are provided for performing machine learning for an image classification task. The method includes overlaying, by a processor operatively coupled to one or more databases, a second image on a first image obtained from one or more training sets in the one or more databases, to form a mixed image, by averaging an intensity of each of a plurality of co-located pixel pairs in the first and the second image. The method also includes training, by the processor, a machine learning process configured for the image classification task using the mixed image to augment data used by the machine learning process for the image classification task.