Patent classifications
G06V10/7753
Method, apparatus, and electronic device for training neural network model
The present disclosure relates to a method for training a neural network model performed at an electronic device. The method includes: performing initial training by using a first training sample set to obtain an initial neural network model; performing a prediction on a second training sample set by using the initial neural network model to obtain a prediction result of each of training samples in the second training sample set; determining a plurality of preferred samples from the second training sample set based on the prediction results; adding the plurality of preferred samples that are annotated to the first training sample set to obtain an expanded first training sample set; updating training of the initial neural network model by using the expanded first training sample set to obtain an updated neural network model until a training ending condition is satisfied.
Leveraging unsupervised meta-learning to boost few-shot action recognition
The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes. A query video segment is obtained and classified as a video class using the fine-tuned cross-attention module.
UNSUPERVISED DOMAIN ADAPTATION WITH NEURAL NETWORKS
Approaches presented herein provide for unsupervised domain transfer learning. In particular, three neural networks can be trained together using at least labeled data from a first domain and unlabeled data from a second domain. Features of the data are extracted using a feature extraction network. A first classifier network uses these features to classify the data, while a second classifier network uses these features to determine the relevant domain. A combined loss function is used to optimize the networks, with a goal of the feature extraction network extracting features that the first classifier network is able to use to accurately classify the data, but prevent the second classifier from determining the domain for the image. Such optimization enables object classification to be performed with high accuracy for either domain, even though there may have been little to no labeled training data for the second domain.
Deep Active Learning Method for Civil Infrastructure Defect Detection
An image processing system includes a memory to store a classifier and a set of labeled images for training the classifier, wherein each labeled image is labeled as either a positive image that includes an object of a specific type or a negative image that does not include the object of the specific type, wherein the set of labeled images has a first ratio of the positive images to the negative images. The system includes an input interface to receive a set of input images, a processor to determine a second ratio of the positive images, to classify the input images into positive and negative images to produce a set of classified images, and to select a subset of the classified images having the second ratio of the positive images to the negative images, and an output interface to render the subset of the input images for labeling.
TRAFFIC VIOLATION PREDICTION
Systems and methods for traffic violation prediction. The systems and methods include obtaining a plurality of bounding boxes of road scene categories from an input dataset by employing a pre-trained detection model. A plurality of pseudo-labels of road scene categories for the plurality of bounding boxes can be obtained by employing the pre-trained detection model. A labeled dataset can be obtained by filtering the input dataset for images having the plurality of pseudo-labels and the plurality of bounding boxes. A traffic violation prediction model can be trained with both unlabeled and labeled dataset including the road scene categories obtained from the pre-trained detection model to predict simultaneous traffic violations of one or more riders in a road scene.
METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM FOR IMAGE SEGMENTATION
Embodiments of the disclosure provides technologies for image segmentation. The method includes: extracting an image feature representation of a target image using a trained image encoder; for each of a plurality of classes, generating, using a trained text encoder, a text feature representation corresponding to a name of the class, and determining a candidate segmentation map for the target image and a class confidence of the class based on the image feature representation and the text feature representation; selecting, from the plurality of classes, at least one class related to the target image based on a plurality of class confidences determined respectively for the plurality of classes; and determining a target segmentation map for the target image based on the at least one candidate segmentation map and the at least one class confidence determined for the at least one selected class.
MEDICAL IMAGE PROCESSING METHOD, MEDICAL IMAGE PROCESSING APPARATUS, AND STORAGE MEDIUM
A medical image processing method according to an embodiment of the present disclosure includes: training a deep neural network by using labeled image data; obtaining a first augmented image by carrying out a weak data augmentation on unlabeled image data; performing a predicting process on the first augmented image by using the deep neural network and determining whether each of the pixels in the first augmented image is able to serve as a pseudo-label on the basis of prediction information of the pixel; obtaining a second augmented image by carrying out a strong data augmentation on the first augmented image; training the deep neural network by using the second augmented image and the pseudo-labels; and updating the deep neural network on the basis of training results of the labeled image data and the unlabeled image data and processing a medical image by using the updated deep neural network.
SYSTEM FOR SIMPLIFIED GENERATION OF SYSTEMS FOR BROAD AREA GEOSPATIAL OBJECT DETECTION
A system for simplified generation of systems for analysis of satellite images to geolocate one or more objects of interest. A plurality of training images labeled for a study object or objects with irrelevant features loaded into a preexisting feature identification subsystem causes automated generation of models for the study object. This model is used to parameterize pre-engineered machine learning elements that are running a preprogrammed machine learning protocol. Training images with the study are used to train object recognition filters. This filter is used to identify the study object in unanalyzed images. The system reports results in a requestor's preferred format.
METHOD AND SYSTEM FOR SIMULTANEOUS SCENE PARSING AND MODEL FUSION FOR ENDOSCOPIC AND LAPAROSCOPIC NAVIGATION
A method and system for scene parsing and model fusion in laparoscopic and endoscopic 2D/2.5D image data is disclosed. A current frame of an intra-operative image stream including a 2D image channel and a 2.5D depth channel is received. A 3D pre-operative model of a target organ segmented in pre-operative 3D medical image data is fused to the current frame of the intra-operative image stream. Semantic label information is propagated from the pre-operative 3D medical image data to each of a plurality of pixels in the current frame of the intra-operative image stream based on the fused pre-operative 3D model of the target organ, resulting in a rendered label map for the current frame of the intra-operative image stream. A semantic classifier is trained based on the rendered label map for the current frame of the intra-operative image stream.
SYSTEM AND METHOD FOR CLASSIFYING AND SEGMENTING MICROSCOPY IMAGES WITH DEEP MULTIPLE INSTANCE LEARNING
Systems and methods that receive as input microscopy images, extract features, and apply layers of processing units to compute one or more set of cellular phenotype features, corresponding to cellular densities and/or fluorescence measured under different conditions. The system is a neural network architecture having a convolutional neural network followed by a multiple instance learning (MIL) pooling layer. The system does not necessarily require any segmentation steps or per cell labels as the convolutional neural network can be trained and tested directly on raw microscopy images in real-time. The system computes class specific feature maps for every phenotype variable using a fully convolutional neural network and uses multiple instance learning to aggregate across these class specific feature maps. The system produces predictions for one or more reference cellular phenotype variables based on microscopy images with populations of cells.