Patent classifications
G06V10/7753
Relationship modeling and anomaly detection based on video data
A method includes acquiring digital video data that portrays an interacting event, identifying a plurality of features in the digital video data, and analyzing the plurality of features to create a relationship graph. The relationship graph comprises a plurality of nodes and a plurality of edges, each node of the plurality of nodes represents an individual of the plurality of individuals, and each edge of the plurality of edges extends between two nodes of the plurality of nodes, and the plurality of edges represents a plurality of interactions of the interacting event. The method further includes identifying an edge of the plurality of edges as an anomalous edge, creating an output representative of the anomalous edge, and outputting the output representative of the anomalous edge. The anomalous edge is identified by a computer-implemented machine learning model configured to identify anomalous edges in relationship graphs.
Unsupervised domain adaptation with neural networks
Approaches presented herein provide for unsupervised domain transfer learning. In particular, three neural networks can be trained together using at least labeled data from a first domain and unlabeled data from a second domain. Features of the data are extracted using a feature extraction network. A first classifier network uses these features to classify the data, while a second classifier network uses these features to determine the relevant domain. A combined loss function is used to optimize the networks, with a goal of the feature extraction network extracting features that the first classifier network is able to use to accurately classify the data, but prevent the second classifier from determining the domain for the image. Such optimization enables object classification to be performed with high accuracy for either domain, even though there may have been little to no labeled training data for the second domain.
Human-centric visual diversity auditing
A methodology for auditing the visual diversity of unlabeled human face image datasets uses a set of core human interpretable dimensions derived from human similarity judgments. Given a face image, a model can output dimensional values aligned with the human mental representational space of faces, where values not only express the presence of a feature, but also its extent. Since the model can be learned entirely from human behavior, the learned dimensions are not biased toward features that are easier to verbalize or quantify.
Adaptive human instance segmentation with stereo view consistency
A system stores first and second images generated by first and second cameras; applies a segmentation model to the first image to generate a first segmentation mask identifying object instances; applies the segmentation model to the second image to generate a second segmentation mask identifying the object instances; projects the first segmentation mask to a viewpoint of the second camera to generate a first projected segmentation mask; converts the first projected segmentation mask and the second segmentation mask to first and second semantic masks, respectively; and computes a first similarity value based on the first and second semantic masks. This may be repeated exchanging the first and second images to compute a second similarity value. The system determines a loss value based on the first similarity value and the second similarity value and trains the segmentation model based on the loss value.
Computer-readable recording medium storing object detection program, device, and machine learning model generation method of training object detection model to detect category and position of object
A recording medium storing a program for causing a computer to execute processing including: acquiring, from a first model trained based on training data in which the first object is labeled in an image, a first portion specifying a region in an image that includes a first object; generating a third model by combining the first portion and a third portion of a second model being a model that includes a second portion and the third portion and that is trained based on training data in which position information regarding the second object is labeled in an image, the second portion being a portion that specifies a region in an image including a second object, the third portion being a portion that determines a position in an image of a specified region; and outputting a detection result of an object by inputting an image to the third model.
Partial labeling mechanism for quick and accurate training of machine learning models
A device generates a training set of images by, for each image of a plurality of training images, receiving user input of a set of labels for a portion of the image, the portion less than an entirety of the image, the set of labels comprising classifications of individual pixels within the image, and automatically applying a label of unknown to a remainder of the image that excludes the portion of the image. The device inputs an unlabeled image into a machine learning model, the machine learning model trained using the training set, and receives, as output from the machine learning model, predicted classifications for each pixel of the image.
UNSUPERVISED PRE-TRAINING OF NEURAL NETWORKS USING GENERATIVE MODELS
In various examples, systems and methods are disclosed relating to generating a response from image and/or video input for image/video-based artificial intelligence (AI) systems and applications. Systems and methods are disclosed for a first model (e.g., a teacher model) distilling its knowledge to a second model (a student model). The second model receives a downstream image in a downstream task and generates at least one feature. The first model generates first features corresponding to an image which can be a real image or a synthetic image. The second model generates second features using the image as an input to the second model. Loss with respect to first features is determined. The second model is updated using the loss.
FOUNDATION MODEL PRE-TRAINING USING SELF-SUPERVISED LEARNING FOR AUTONOMOUS AND SEMI-AUTONOMOUS SYSTEMS AND APPLICATIONS
In various examples, self-supervised learning may be used to pre-train an encoder network of a masked prediction model to reconstruct masked regions of an input representation of 3D detections such as LiDAR point cloud(s). Spatial and/or temporal masking may be applied to a projected representation of 3D detections (e.g., a two-dimensional (2D) projection image), and the masked prediction model (e.g., a masked auto-encoder or joint-embedding predictive architecture) may be used to reconstruct a representation of the masked regions (e.g., reflection characteristic(s) stored in corresponding pixels or cells of the projected representation, a latent representation of the reflection characteristic(s)) during iterations of self-supervised learning. As such, the pre-trained encoder network of the masked prediction model may be used as a foundation model and fine-tuned with a task-specific output head or its pre-trained weights may be used to initialize a task-specific model.
Machine learning of spatio-temporal manifolds for source-free video domain adaptation
Methods and systems for training a model include performing spatial augmentation on an unlabeled input video to generate spatially augmented video. Temporal augmentation is performed on the input video to generate temporally augmented video. Predictions are generated, using a model that was pre-trained on a labeled dataset, for the unlabeled input video, the spatially augmented video, and the temporally augmented video. Parameters of the model are adapted using the predictions while enforcing temporal consistency, temporal consistency, and historical consistency. The model may be used for action recognition in a healthcare context, with recognition results being used for determining whether patients are performing a rehabilitation exercise correctly.
Methods for characterizing and treating a cancer type using cancer images
Described herein are methods, systems, devices and computer program products for characterizing or identifying a type of cancer. Also described are methods of treating a characterized or identified chancer. For example, certain methods may be used to characterize a homologous recombination deficiency status of a cancer.