Patent classifications
G06V10/7753
TRAINING METHOD FOR SEMI-SUPERVISED LEARNING MODEL, IMAGE PROCESSING METHOD, AND DEVICE
Embodiments of this application disclose a training method for a semi-supervised learning model which can be applied to computer vision in the field of artificial intelligence. The method includes: first predicting classification categories of some unlabeled samples by using a trained first semi-supervised learning model, to obtain a prediction label; and determining whether each prediction label is correct in a one-bit labeling manner, and if prediction is correct, obtaining a correct label (a positive label) of the sample, or if prediction is incorrect, excluding an incorrect label (a negative label) of the sample. Then, in a next training phase, a training set (a first training set) is reconstructed based on the information, and an initial semi-supervised learning model is retrained based on the first training set, to improve prediction accuracy of the model. In one-bit labeling, an annotator only needs to answer “yes” or “no” for the prediction label.
System and Method with Visual Concept Framework for Self-Supervised Semantic Segmentation
A computer-implemented system and method includes generating first pseudo segment data from a first augmented image and generating second pseudo segment data from a second augmented image. The first augmented image and the second augmented image are in a dataset along with other augmented images. A machine learning system is configured to generate pixel embeddings based on the dataset. The first pseudo segment data and the second pseudo segment data are used to identify a first set of segments that a given pixel belongs with respect to the first augmented image and the second augmented image. A second set of segments is identified across the dataset. The second set of segments do not include the given pixel. A local segmentation loss is computed for the given pixel based on the corresponding pixel embedding that involves attracting the first set of segments while repelling the second set of segments. The global concept loss is computed based on a similarity determination between the first set of segments and a concept vector of a corresponding concept. The corresponding concept categorizes the first set of segments with other sets of segments across the dataset based on semantic meaning. The parameters of machine learning system are updated based on the total loss that takes into account at least the local segmentation loss and the global concept loss.
METHOD OF UNSUPERVISED DOMAIN ADAPTATION IN ORDINAL REGRESSION
A method of jointly training of a transferable feature extractor network, an ordinal regressor network, and an order classifier network in an ordinal regression unsupervised domain adaption network by providing a source of labeled source images and unlabeled target images; outputting image representations from a transferable feature extractor network by performing a minimax optimization procedure on the source of labeled source images and unlabeled target images; training a domain discriminator network, using the image representations from the transferable feature extractor network, to distinguish between source images and target images; training an ordinal regressor network using a full set of source images from the transferable feature extractor network; and training an order classifier network using a full set of source images from said transferable feature extractor network.
Realistic neural network based image style transfer
A mobile device can implement a neural network-based style transfer scheme to modify an image in a first style to a second style. The style transfer scheme can be configured to detect an object in the image, apply an effect to the image, and blend the image using color space adjustments and blending schemes to generate a realistic result image. The style transfer scheme can further be configured to efficiently execute on the constrained device by removing operational layers based on resources available on the mobile device.
Depth data model training with upsampling, losses and loss balancing
Techniques for training a machine learned (ML) model to determine depth data based on image data are discussed herein. Training can use stereo image data and depth data (e.g., lidar data). A first (e.g., left) image can be input to a ML model, which can output predicted disparity and/or depth data. The predicted disparity data can be used with second image data (e.g., a right image) to reconstruct the first image. Differences between the first and reconstructed images can be used to determine a loss. Losses may include pixel, smoothing, structural similarity, and/or consistency losses. Further, differences between the depth data and the predicted depth data and/or differences between the predicted disparity data and the predicted depth data can be determined, and the ML model can be trained based on the various losses. Thus, the techniques can use self-supervised training and supervised training to train a ML model.
Automatically selecting query objects in digital images
The present disclosure relates to an object selection system that automatically detects and selects objects in a digital image utilizing a large-scale object detector. For instance, in response to receiving a request to automatically select a query object with an unknown object class in a digital image, the object selection system can utilize a large-scale object detector to detect potential objects in the image, filter out one or more potential objects, and label the remaining potential objects in the image to detect the query object. In some implementations, the large-scale object detector utilizes a region proposal model, a concept mask model, and an auto tagging model to automatically detect objects in the digital image.
Systems and Methods for Contrastive Learning of Visual Representations
Systems, methods, and computer program products for performing semi-supervised contrastive learning of visual representations are provided. For example, the present disclosure provides systems and methods that leverage particular data augmentation schemes and a learnable nonlinear transformation between the representation and the contrastive loss to provide improved visual representations. Further, the present disclosure also provides improvements for semi-supervised contrastive learning. For example, computer-implemented method may include performing semi-supervised contrastive learning based on a set of one or more unlabeled training data, generating an image classification model based on a portion of a plurality of layers in a projection head neural network used in performing the contrastive learning, performing fine-tuning of the image classification model based on a set of one or more labeled training data, and after performing the fine-tuning, distilling the image classification model to a student model comprising a relatively smaller number of parameters than the image classification model.
Self-supervised cross-video temporal difference learning for unsupervised domain adaptation
A method is provided for Cross Video Temporal Difference (CVTD) learning. The method adapts a source domain video to a target domain video using a CVTD loss. The source domain video is annotated, and the target domain video is unannotated. The CVTD loss is computed by quantizing clips derived from the source and target domain videos by dividing the source domain video into source domain clips and the target domain video into target domain clips. The CVTD loss is further computed by sampling two clips from each of the source domain clips and the target domain clips to obtain four sampled clips including a first source domain clip, a second source domain clip, a first target domain clip, and a second target domain clip. The CVTD loss is computed as |(second source domain clip−first source domain clip)−(second target domain clip−first target domain clip)|.
Methods and systems for monitoring objects for labelling
A graphical user interface (GUI) for forming hierarchically arranged clusters of items and operating thereupon through an electronic device equipped with an input-device and a display-screen is provided. The GUI comprises a first area configured to display a graphical-tree representation having a plurality of hierarchical levels, each of said level corresponds to at least one cluster of content-items formed by execution of a machine-learning classifier over a plurality of input content items. A second area is configured to display a dataset corresponding to the content-items classified within the clusters. A third area is configured to display a plurality of types of content representations with respect to each selected cluster, said representations corresponding to content-items classified within the cluster.
METHOD AND APPARATUS FOR SEMI-SUPERVISED LEARNING
Provided is a computer-implemented method for training a machine learning (ML) model using labelled and unlabelled data, the method comprising obtaining a set or training data comprising a set of labelled data items and a set of unlabelled data items, training a loss module of the ML model using labels in the set of labelled data items, to generate a trained loss module capable of estimating a likelihood of a label for a data item, and training a task module of the ML model using the loss module, the set of labelled data items, and the set of unlabelled data items, to generate a trained task module capable of making a prediction of a label for input data.