G06V10/7753

Unsupervised pre-training of neural networks using generative models

In various examples, systems and methods are disclosed relating to generating a response from image and/or video input for image/video-based artificial intelligence (AI) systems and applications. Systems and methods are disclosed for a first model (e.g., a teacher model) distilling its knowledge to a second model (a student model). The second model receives a downstream image in a downstream task and generates at least one feature. The first model generates first features corresponding to an image which can be a real image or a synthetic image. The second model generates second features using the image as an input to the second model. Loss with respect to first features is determined. The second model is updated using the loss.

Self-supervised learning for medical image quality control

Provided herein are methods for automated image quality control (QC). The method comprises: generating training data based at least in part on metadata obtained from a data augmentation process; training a model for a QC task based at least in part on the training data. The model is trained using a self-supervised learning algorithm.

NETWORK-TRAINED NEURAL NETWORKS AND ADVERSARIAL-TRAINED NEURAL NETWORKS

A system for training a student neural network using a trained supervisory neural network. The system includes at least one processor comprising circuitry and a memory. The memory includes instructions that when executed by the circuitry cause the at least one processor to: receive an image including a representation of a feature of interest, provide the image as input to the trained supervisory neural network, provide the image as input to the student neural network, receive a first output from the trained supervisory neural network indicative of at least one characteristic of the feature of interest, receive a second output from the student neural network indicative of the at least one characteristic of the feature of interest, compare the first output to the second output, and based on a detected difference between the first output and the second output, automatically update at least one aspect of the student neural network.

Generation of semantically modified variations of images with transformer networks
12469283 · 2025-11-11 · ·

A method for generating a semantically modified variation of an image. In the method: the image is divided into equally sized, non-overlapping patches; the patches are converted with a patch encoding function of a transformer network into a chain of tokens in a workspace; the tokens are grouped into preservation tokens, whose information is to be preserved in the variation, and masked tokens, whose information is to be masked in the variation; the preservation tokens are converted with an encoder of the transformer network into a chain of processed tokens; the chain is supplemented through application of an insertion operator that inserts these masked tokens at positions corresponding to the positions of the masked tokens in the original chain to form a chain that represents the sought variation; the chain is converted with a decoder of the transformer network into the sought variation.

Bayesian semantic segmentation active learning with beta approximation

Training of a machine vision model, a segmentation model, is performed by using an acquisition function for a small number of pixels of one or more training images. The acquisition function uses first mutual information and second mutual information to identify unlabelled pixels which are labelled with high uncertainty when predicting possible label values. Training, prediction of labels, identifying pixels with highly uncertain labels, obtaining labels only for those pixels with highly uncertain labels and retraining are performed iteratively to finally provide the machine vision model. The iterative approach uses very few labelled pixels to obtain the final machine vision model. The machine vision model accurately labels areas of a data image.

Relationship modeling and key feature detection based on video data
12494058 · 2025-12-09 · ·

A method includes acquiring digital video data that portrays an interacting event, extracting image data, audio data, and semantic text data from the video data, analyzing the extracted data to identify a plurality of video features, and analyzing the plurality of video features to create a relationship graph. The interacting event comprises a plurality of interactions between plurality of individuals and the relationship graph comprises a plurality of nodes and a plurality of edges. Each node of the plurality of nodes represents an individual of the plurality of individuals, and each edge of the plurality of edges extends between two nodes of the plurality of nodes, and the plurality of edges represents the plurality of interactions. The method further comprises determining whether a first key feature is present in the relationship graph, wherein presence of the first key feature is predictive of a positive outcome of the interacting event.

METHOD OF GENERATING HIGHLY CONSISTENT PREDICTED VALUES FROM PLANER IMAGES
20250378680 · 2025-12-11 ·

The present invention relates to a method of training a prediction model to generate a main predicted value of a main feature of an input image. The method comprises training the prediction model with a primary dataset containing labeled training images labeled with ground truth values and a secondary dataset containing two unlabeled training images without ground truth values. The training goal is to reduce both a first loss and a second loss, wherein the first loss calculates the difference between predicted values of the labeled training image and the ground truth values, and the second loss calculates the difference between predicted values of the two unlabeled training images.

SYSTEM FOR DETECTING FALLING OBJECT ON TABLE AND SERVER INCLUDED THEREIN
20250378547 · 2025-12-11 ·

A system for detecting a falling object on a table includes a camera unit configured to obtain at least one image of a surface of the table, and a server configured to detect, from the at least one image, a first state in which an object is on the surface of a table or a second state in which no object is on the surface of the table, obtain information, when the first state is detected, on a first time during which the first state continues, detect that a falling object is on the surface of the table when the first time is more than a first reference time, and detect that no falling object is on the surface of the table when the first time is the first reference time or less.

Unsupervised pre-training of geometric vision models

A method includes: performing unsupervised pre-training of a model, the model including and a decoder including: obtaining a first image and a second image under different conditions or from different viewpoints; encoding, by the encoder, the first image into a representation of the first image and the second image into a representation of the second image; transforming the representation of the first image into a transformed representation; decoding, by the decoder, the transformed representation into a reconstructed image, where the transforming of the representation of the first image and the decoding of the transformed representation is based on the representation of the first image and the representation of the second image; and adjusting one or more parameters of at least one of the encoder and the decoder based on minimizing a loss; and fine-tuning the model, initialized with a set of task specific encoder parameters, for a geometric vision task.

Method and system to augment images and labels to be compatible with various machine learning models

A method includes obtaining an authentic image of an assembly and a boundary label provided with the authentic image. The boundary label is associated with a selected region of the authentic image depicting a selected object. The method includes generating an augmented image based on the authentic image and an augmentation model employing one or more augmentation parameters, defining a model boundary label on a blank image at a region that correlates with the selected region of the authentic image, generating an augmented blank image based on the one or more augmentation parameters employed for the augmented image, identifying, as an augmented boundary label associated with augmented image, the model boundary label in the augmented blank image, and outputting an augmented image data, wherein the augmented image data incudes data indicative of the augmented image and of the augmented boundary label.