G06V10/7753

SYSTEM AND METHOD FOR VEHICLE OCCLUSION DETECTION
20190272433 · 2019-09-05 ·

A system and method for vehicle occlusion detection is disclosed. A particular embodiment includes: receiving training image data from a training image data collection system; obtaining ground truth data corresponding to the training image data; performing a training phase to train a plurality of classifiers, a first classifier being trained for processing static images of the training image data, a second classifier being trained for processing image sequences of the training image data; receiving image data from an image data collection system associated with an autonomous vehicle; and performing an operational phase including performing feature extraction on the image data, determining a presence of an extracted feature instance in multiple image frames of the image data by tracing the extracted feature instance back to a previous plurality of N frames relative to a current frame, applying the first trained classifier to the extracted feature instance if the extracted feature instance cannot be determined to be present in multiple image frames of the image data, and applying the second trained classifier to the extracted feature instance if the extracted feature instance can be determined to be present in multiple image frames of the image data.

System and method for learning random-walk label propagation for weakly-supervised semantic segmentation
10402690 · 2019-09-03 · ·

Systems and methods for training semantic segmentation. Embodiments of the present invention include predicting semantic labeling of each pixel in each of at least one training image using a semantic segmentation model. Further included is predicting semantic boundaries at boundary pixels of objects in the at least one training image using a semantic boundary model concurrently with predicting the semantic labeling. Also included is propagating sparse labels to every pixel in the at least one training image using the predicted semantic boundaries. Additionally, the embodiments include optimizing a loss function according the predicted semantic labeling and the propagated sparse labels to concurrently train the semantic segmentation model and the semantic boundary model to accurately and efficiently generate a learned semantic segmentation model from sparsely annotated training images.

INTEGRATED MACHINE LEARNING AUDIOVISUAL APPLICATION FOR A DEFINED SUBJECT

Disclosed herein are system, method, and computer program product embodiments for utilizing a feedback loop to continuously improve an artificial intelligence (AI) engine's determination of predictive features associated with a topic. An embodiment operates by training an AI engine for a topic using data from a data source, wherein the topic is associated with a geolocation. The embodiments first receives a set of predictive features for the topic from the trained AI engine. The embodiment transmits the set of predictive features for the topic to a set of electronic devices. The embodiment second receives a set of audiovisual content captured by the set of electronic devices. The set of electronic devices capture the set of audiovisual content based on the set of predictive features for the topic. The embodiment finally retrains the AI engine based on the first set of audiovisual content.

Image classifier learning device, image classifier learning method, and program

An object is to make it possible to train an image recognizer by efficiently using training data that does not include label information. A determination unit 180 causes repeated execution of the followings. A feature representation model for extracting feature vectors of pixels is trained such that an objective function is minimized, the objective function being expressed as a function that includes a value that is based on a difference between a distance between feature vectors of pixels labeled with a positive example label and a distance between a feature vector of a pixel labeled with the positive example label and a feature vector of an unlabeled pixel, and a value that is based on a difference between a distance between a feature vector of a pixel labeled with the positive example label and a feature vector of an unlabeled pixel and a distance between a feature vector of a pixel labeled with the positive example label and a feature vector of a pixel labeled with a negative example label, and based on a distribution of feature vectors corresponding to the positive example label, a predetermined number of labels are given based on the likelihood that each unlabeled pixel is a positive example.

Unsupervised domain adaptation with neural networks

Approaches presented herein provide for unsupervised domain transfer learning. In particular, three neural networks can be trained together using at least labeled data from a first domain and unlabeled data from a second domain. Features of the data are extracted using a feature extraction network. A first classifier network uses these features to classify the data, while a second classifier network uses these features to determine the relevant domain. A combined loss function is used to optimize the networks, with a goal of the feature extraction network extracting features that the first classifier network is able to use to accurately classify the data, but prevent the second classifier from determining the domain for the image. Such optimization enables object classification to be performed with high accuracy for either domain, even though there may have been little to no labeled training data for the second domain.

Method and system for generating and labelling reference images

The invention relates to method and system for automatically generating and labelling reference images. In some embodiments, the method includes tracking a plurality of highlighted objects in a set of input images along with audio data associated with the plurality of highlighted objects. The method further includes cropping each of the plurality of highlighted objects from each of the set of images based on tracking, contemporaneously capturing an audio clip associated with each of the plurality of highlighted objects from the audio data based on tracking, and labelling each of the plurality of highlighted objects based on text data generated from the audio clip associated with each of the plurality of objects to generate a labelled reference image.

INFORMATION PROCESSING SYSTEM AND LEARNING MODEL GENERATION METHOD
20240161443 · 2024-05-16 ·

Before recognition processing is performed, preprocessing is performed on image data acquired by a sensor or image data obtained by converting the image data. An information processing system according to an embodiment includes a specifying unit (201) that specifies a correction target pixel in a depth map using a first learning model and a correction unit (202) that corrects the correction target pixel specified by the specifying unit.

METHOD, APPARATUS, DEVICE AND MEDIUM FOR PROCESSING IMAGE USING MACHINE LEARNING MODEL
20240161472 · 2024-05-16 ·

A method, device, and medium are provided for processing an image using a machine learning model that identifies at least one candidate object from an image. The model comprises: a feature extraction model for describing an association between the image and a feature of the at least one candidate object; and a classification scoring model for describing an association between the feature and a classification score of the at least one candidate object. An update parameter associated with the classification scoring model is determined based on the classification score of the at least one candidate object and a ground truth classification score of at least one ground truth object in the image. The classification scoring model is updated based on the update parameter associated with the classification scoring model. The feature extraction model is prevented from being updated with the update parameter associated with the classification scoring model.

HDR-BASED AUGMENTATION FOR CONTRASTIVE SELF-SUPERVISED LEARNING
20240161255 · 2024-05-16 ·

According to an aspect, there is provided a method that includes receiving a first image and a second image as inputs for contrastive self-supervised learning; applying a high dynamic range augmentation to the first image to generate a first pair of views; applying the high dynamic range augmentation to the second image to generate a second pair of views; applying a first convolutional neural network to the first pair of views to output a first pair of encoded representations; applying a second convolutional neural network to the second pair of views to output a second pair of encoded representations; projecting the first pair of encoded representations to form first projected representations; projecting the second pair of encoded representations to form second projected representations; and training a machine learning model using the high dynamic range augmentations and an objective function that provides contrastive self-supervised learning.

MACHINE LEARNING OF SPATIO-TEMPORAL MANIFOLDS FOR SOURCE-FREE VIDEO DOMAIN ADAPTATION
20240161473 · 2024-05-16 ·

Methods and systems for training a model include performing spatial augmentation on an unlabeled input video to generate spatially augmented video. Temporal augmentation is performed on the input video to generate temporally augmented video. Predictions are generated, using a model that was pre-trained on a labeled dataset, for the unlabeled input video, the spatially augmented video, and the temporally augmented video. Parameters of the model are adapted using the predictions while enforcing temporal consistency, temporal consistency, and historical consistency. The model may be used for action recognition in a healthcare context, with recognition results being used for determining whether patients are performing a rehabilitation exercise correctly.