G06V10/7753

Domain vector-based domain adaptation for object detection and instance segmentation

A computer-implemented method for domain adaptation of an object detection model includes obtaining a domain vector for a domain from one or more images in the domain, the domain vector representing the property of the domain. The domain vector is input into a fully connected layers in the object detection model. A domain-specific result of the object detection model is provided as output. The method can further include computing a domain tensor and inputting the domain tensor into convolutional layers in the object detection model.

System and method for vehicle occlusion detection
12552375 · 2026-02-17 · ·

A system and method for vehicle occlusion detection is disclosed. A particular embodiment includes: receiving training image data from a training image data collection system; obtaining ground truth data corresponding to the training image data; performing a training phase to train a plurality of classifiers, a first classifier being trained for processing static images of the training image data, a second classifier being trained for processing image sequences of the training image data; receiving image data from an image data collection system associated with an autonomous vehicle; and performing an operational phase including performing feature extraction on the image data, determining a presence of an extracted feature instance in multiple image frames of the image data by tracing the extracted feature instance back to a previous plurality of N frames relative to a current frame, applying the first trained classifier to the extracted feature instance if the extracted feature instance cannot be determined to be present in multiple image frames of the image data, and applying the second trained classifier to the extracted feature instance if the extracted feature instance can be determined to be present in multiple image frames of the image data.

Training method for semi-supervised learning model, image processing method, and device

Embodiments of this application disclose a training method for a semi-supervised learning model which can be applied to computer vision in the field of artificial intelligence. The method includes: first predicting classification categories of some unlabeled samples by using a trained first semi-supervised learning model, to obtain a prediction label; and determining whether each prediction label is correct in a one-bit labeling manner, and if prediction is correct, obtaining a correct label (a positive label) of the sample, or if prediction is incorrect, excluding an incorrect label (a negative label) of the sample. Then, in a next training phase, a training set (a first training set) is reconstructed based on the information, and an initial semi-supervised learning model is retrained based on the first training set, to improve prediction accuracy of the model. In one-bit labeling, an annotator only needs to answer yes or no for the prediction label.

Image detection method and apparatus

An image detection method and apparatus are disclosed. The method includes: performing feature extraction processing on the image to obtain a feature representation subset of the image; generating attention weights corresponding to the at least two sub-image features; performing weighting aggregation processing on the at least two sub-image features according to the attention weights to obtain a first feature vector; performing clustering sampling processing on the at least two sub-image features to obtain at least two classification clusters comprising sampled sub-image features; determining a block sparse self-attention for each of the sampled sub-image features according to the at least two classification clusters and a block sparse matrix; determining a second feature vector according to at least two block sparse self-attentions respectively corresponding to the at least two classification clusters; and determining a classification result of the image according to the first feature vector and the second feature vector.

Visual search and discovery via generative model inversion

Solutions for visual search and discovery include performing unsupervised training of a generative adversarial network that has a generator and an assessor. Training the generative adversarial network involves alternating training the assessor with the generator and a plurality of catalog images with training the generator with the assessor. The catalog images are inverted into catalog vectors by leveraging the trained generator. A query image is inverted into a query vector, and image similarity is determined by calculating a distance between the query vector and a catalog vector. In some examples, inversion is performed by training an encoder with the trained generator and inverting the catalog images with the encoder. In some examples, the trained generator is used to perform a search in a vector space. A weighting vector may be used to weight elements of the vectors, effectively prioritizing image features for image similarity determination.

Relationship modeling and adjustment based on video data
12548329 · 2026-02-10 · ·

A method includes acquiring digital video data that portrays an interacting event, identifying a plurality of features in the digital video data with a first computer-implemented machine learning model, analyzing the plurality of features to create a baseline relationship graph, determining a target relationship graph, generating one or more actions for increasing similarity between the baseline relationship graph and the target relationship graph, and outputting the one or more actions by a user interface. The one or more actions are generated using a simulator, a second computer-implemented machine learning model, and a plurality of actions. The second computer-implemented machine learning model is configured to relate actions of the plurality of actions to changes to relationship graphs, the simulator is configured to simulate changes to the baseline relationship graph using the second computer-implemented machine learning model and the plurality of actions.

Relationship modeling and evaluation based on video data
12548328 · 2026-02-10 · ·

A method includes acquiring digital video data that portrays an interacting event, identifying a plurality of video features in the digital video data, analyzing the plurality of video features to create a relationship graph, determining a relationship score based on the relationship graph using a first computer-implemented machine learning model, and outputting the relationship score with a user interface. The interacting event comprises a plurality of interactions between a first individual and a second individual and each video feature of the plurality of video features corresponds to an interaction of the plurality of interactions. The relationship graph comprises a first node, a second node, and a first edge extending from the first node to the second node. The first node represents the first individual, the second node represents the second individual, and a weight of the first edge represents a relationship strength between the first individual and the second individual.

Object detection with cross-domain mixing

Implementations are described herein for improving unsupervised domain adaptation (UDA) by using improved adaptive teacher for object detection with cross-domain mix-up. In various implementations, cross-domain training of an object detection machine learning model may include: performing weak augmentation on images from a target domain D.sub.T to generate a first set of weakly augmented target domain images; perform strong augmentation on images from the source domain D.sub.S and images from the target domain D.sub.T to generate a second set of strongly augmented images; processing the second set of strongly augmented images to generate a third set of inter-domain mixes of the images from D.sub.S and D.sub.T; and jointly train the object detection machine learning model, as a student machine learning model, with a teacher machine learning model using the first and third sets.

Multi-dimension unified swin transformer for lesion segmentation

A system and method of multi-stage training of a transformer-based machine-learning model. The system performs at least two stages of the following three stages of training: During a first stage, the system pre-trains a transformer encoder via a first machine-learning network using an unlabeled 3D image dataset. During a second stage, the system fine-tunes the pre-trained transformer encoder via a second machine-learning network via a labeled 2D image dataset. During a third stage, the system further fine-tunes the previously pre-trained transformer encoder or fine-tuned transformer encoder via a third machine-learning network using a labeled 3D image dataset.

NATURAL AUGMENTATION OF IMAGE TRAINING DATASETS
20260038245 · 2026-02-05 ·

In some embodiments, a method for training a machine learning device includes: ascertaining geographic location information of at least one portion of a first image associated with a label; associating with the label a second image including at least a portion having substantially the same geographic location information as the at least one portion of the first image; optional alignment or coregistration of the first and second image to maximize mutual information overlap; forming a training dataset comprising the first and second images as input images and the label that the first and second images are associated with as outputs; optional binary categorization and curation of the resulting training dataset to ensure accuracy; and training the machine learning model using the augmented dataset.