Patent classifications
G06V10/7753
METHOD AND SYSTEM FOR GENERATING AND LABELLING REFERENCE IMAGES
The invention relates to method and system for automatically generating and labelling reference images. In some embodiments, the method includes tracking a plurality of highlighted objects in a set of input images along with audio data associated with the plurality of highlighted objects. The method further includes cropping each of the plurality of highlighted objects from each of the set of images based on tracking, contemporaneously capturing an audio clip associated with each of the plurality of highlighted objects from the audio data based on tracking, and labelling each of the plurality of highlighted objects based on text data generated from the audio clip associated with each of the plurality of objects to generate a labelled reference image.
MODEL TRAINING USING PARTIALLY-ANNOTATED IMAGES
Methods and systems for training a model labeling two or more organic structures in an image. One method includes receiving a set of training images including a first plurality of images and a second plurality of images. Each of the first plurality of images including a label for a first subset of the two or more organic structures and each of the second plurality of images including a label for a second subset of the two or more organic structures, the second subset being different than the first subset. The method also includes training the model using the first plurality of images, the second plurality of images, and a label merging function mapping a label included in the first plurality of images to a label included in the second plurality of images.
Method for explainable active learning, to be used for object detector, by using deep encoder and active learning device using the same
A method for explainable active learning, to be used for an object detector, by using a deep autoencoder is provided. The method includes steps of an active learning device (a) (i) inputting acquired test images into the object detector to detect objects and output bounding boxes, (ii) cropping regions, corresponding to the bounding boxes, in the test images, (iii) resizing the test images and the cropped images into a same size, and (iv) inputting the resized images into a data encoder of the deep autoencoder to output data codes, and (b) (i) confirming reference data codes corresponding to the number of the resized images less than a counter threshold by referring to a data codebook, (ii) extracting specific data codes from the data codes, (iii) selecting specific test images as rare samples, and (iv) updating the data codebook by referring to the specific data codes.
TRANSFORM DISENTANGLING AUTO-ENCODER AND RELATED METHODS
Discussed herein are devices, systems, and methods for disentangling static and dynamic features of content. A method can include encoding by a transform disentangling autoencoder (AE), first content to generate first static features and first dynamic features and second content to generate second static features and second dynamic features, and constructing, by the AE, third content based on a combination of third static features and the first dynamic features and fourth content based on a combination of fourth static features and the second dynamic features, the third and fourth static features being determined based on the first static features and the second static features.
UNSUPERVISED REPRESENTATION LEARNING WITH CONTRASTIVE PROTOTYPES
The system and method are directed to a prototypical contrastive learning (PCL). The PCL explicitly encodes the hierarchical semantic structure of the dataset into the learned embedding space and prevents the network from exploiting low-level cues for solving the unsupervised learning task. The PCL includes prototypes as the latent variables to help find the maximum-likelihood estimation of the network parameters in an expectation-maximization framework. The PCL iteratively performs an E-step for finding prototypes with clustering and M-step for optimizing the network on a contrastive loss.
MODEL TRAINING USING FULLY AND PARTIALLY-ANNOTATED IMAGES
Methods and systems for training a model labeling two or more organic structures within an image. One method includes receiving a set of training images. The set of training images including a first plurality of images and a second plurality of images. Each of the first plurality of images including a label for each of the two or more organic structures and each of the second plurality of images including a label for only a subset of the two or more organic structures. The method further includes training the model using the first plurality of images, the second plurality of images, and a label merging function mapping a label from the first plurality of images to a label included in the second plurality of images.
MODEL TRAINING METHOD AND RELATED DEVICE
This application provides a model training method in the artificial intelligence field. In a process of determining a loss used to update a model parameter, factors are comprehensively considered. Therefore, an obtained neural network has a strong generalization capability. The method in this application includes: obtaining a first source domain image associated with a target domain image and a second source domain image associated with the target domain image; obtaining a first prediction label of the first source domain image and a second prediction label of the second source domain image through a first to-be-trained model; obtaining a first loss based on the first prediction label and the second prediction label, where the first loss indicates a difference between the first prediction label and the second prediction label; and updating a parameter of the first to-be-trained model based on the first loss, to obtain a first neural network.
TRAINING MACHINE LEARNING MODELS BASED ON UNLABELED DATA
A method of labeling data and training a model is provided. The method includes obtaining a set of images. The set of images includes a first subset and a second subset. The first subset is associated with a first set of labels. The method also includes generating a set of pseudo labels for the set of images and a second set of labels for the second subset based on the first subset, the second subset, a first machine learning model, and a domain adaption model. The method further includes generating second machine learning model. The second machine learning model is generated based on the set of images, the set of pseudo labels, the first set of labels, and the second set of labels. The second set of labels is updated based on one or more inferences generated by the second machine learning model.
VIDEO ACTION SEGMENTATION BY MIXED TEMPORAL DOMAIN ADAPTION
Embodiments herein treat the action segmentation as a domain adaption (DA) problem and reduce the domain discrepancy by performing unsupervised DA with auxiliary unlabeled videos. In one or more embodiments, to reduce domain discrepancy for both the spatial and temporal directions, embodiments of a Mixed Temporal Domain Adaptation (MTDA) approach are presented to jointly align frame-level and video-level embedded feature spaces across domains, and, in one or more embodiments, further integrate with a domain attention mechanism to focus on aligning the frame-level features with higher domain discrepancy, leading to more effective domain adaptation. Comprehensive experiment results validate that embodiments outperform previous state-of-the-art methods. Embodiments can adapt models effectively by using auxiliary unlabeled videos, leading to further applications of large-scale problems, such as video surveillance and human activity analysis.
Determining a lighting configuration based on context
During operation, a computer generates, based at least in part on an initial lighting configuration in an environment and a layout of the environment, and provides a simulation of the environment with the initial lighting configuration. Note that the initial lighting configuration includes one or more lights at predefined or predetermined locations in the environment, dynamic lighting states of the one or more lights, and a dynamic lighting state of the given light includes an intensity and a color of the given light. Moreover, based at least in part on the initial lighting configuration, the layout of the environment, and a determined context of the environment, the computer modifies the initial lighting configuration to obtain an updated lighting configuration. Next, based at least in part on the updated lighting configuration and the layout, the computer generates and selectively provides an updated simulation of the environment with the updated lighting configuration.