G06V10/7753

TEMPORAL CONTRASTIVE LEARNING FOR SEMI-SUPERVISED VIDEO ACTION RECOGNITION
20230138254 · 2023-05-04 ·

A base pathway of a computerized two-pathway video action recognition model is trained using a plurality of labeled video samples. The base pathway is trained using a plurality of unlabeled video samples at a first framerate. An auxiliary pathway of the computerized two-pathway video action recognition model is trained using a plurality of the unlabeled video samples at a second framerate, the second framerate being slower than the first framerate, wherein the training of the base pathway and the training of the auxiliary pathway result in a trained computerized two-pathway video action recognition model. A candidate video is categorized using the trained computerized two-pathway video action recognition model and the categorized candidate video is stored in a computer-accessible video database system for information retrieval.

METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO GENERATE DIGITAL SCENES
20230007970 · 2023-01-12 ·

Methods, systems, articles of manufacture and apparatus to generate digital scenes are disclosed. An example apparatus to generate labelled models includes a map builder to generate a three-dimensional (3D) model of an input image, a grouping classifier to identify a first zone of the 3D model corresponding to a first type of grouping classification, a human model builder to generate a quantity of placeholder human models corresponding to the first zone, a coordinate engine to assign the quantity of placeholder human models to respective coordinate locations of the first zone, the respective coordinate locations assigned based on the first type of grouping classification, a model characteristics modifier to assign characteristics associated with an aspect type to respective ones of the quantity of placeholder human models, and an annotation manager to associate the assigned characteristics as label data for respective ones of the quantity of placeholder human models.

Data Object Classification Using an Optimized Neural Network

A system includes a computing platform having a hardware processor and a memory storing a software code and a neural network (NN) having multiple layers including a last activation layer and a loss layer. The hardware processor executes the software code to identify different combinations of layers for testing the NN, each combination including candidate function(s) for the last activation layer and candidate function(s) for the loss layer. For each different combination, the software code configures the NN based on the combination, inputs, into the configured NN, a training dataset including multiple data objects, receives, from the configured NN, a classification of the data objects, and generates a performance assessment for the combination based on the classification. The software code determines a preferred combination of layers for the NN including selected candidate functions for the last activation layer and the loss layer, based on a comparison of the performance assessments.

INTEGRATED MACHINE LEARNING AUDIOVISUAL APPLICATION FOR A DEFINED SUBJECT

Disclosed herein are system, method, and computer program product embodiments for utilizing a feedback loop to continuously improve an artificial intelligence (AI) engine's determination of predictive features associated with a topic. An embodiment operates by training an AI engine for a topic using data from a data source, wherein the topic is associated with a geolocation. The embodiments first receives a set of predictive features for the topic from the trained AI engine. The embodiment transmits the set of predictive features for the topic to a set of electronic devices. The embodiment second receives a set of audiovisual content captured by the set of electronic devices. The set of electronic devices capture the set of audiovisual content based on the set of predictive features for the topic. The embodiment finally retrains the AI engine based on the first set of audiovisual content.

Methods and systems for identifying internal conditions in juvenile fish through non-invasive means
11798262 · 2023-10-24 · ·

Methods and systems are disclosed for improvements in aquaculture that allow for increasing the number and harvesting efficiency of fish in an aquaculture setting by identifying and predicting internal conditions of the juvenile fish based on external characteristics that are imaged through non-invasive means.

ACTIVE DATA COLLECTION, SAMPLING, AND GENERATION FOR USE IN TRAINING MACHINE LEARNING MODELS FOR AUTOMOTIVE OR OTHER APPLICATIONS

A method includes identifying one or more edge cases associated with at least one trained machine learning model, where the at least one trained machine learning model is configured to perform at least one function related to one or more vehicles. The method also includes obtaining raw data associated with the one or more edge cases from at least one of the one or more vehicles and selecting a subset of the raw data. The method further includes generating synthetic data associated with the one or more edge cases. In addition, the method includes at least one of: retraining the at least one trained machine learning model and training at least one new machine learning model using the selected subset of raw data and the synthetic data.

Video semantic segmentation method based on active learning

The present invention belongs to the technical field of computer vision, and provides a video semantic segmentation method based on active learning, comprising an image semantic segmentation module, a data selection module based on the active learning and a label propagation module. The image semantic segmentation module is responsible for segmenting image results and extracting high-level features required by the data selection module; the data selection module selects a data subset with rich information at an image level, and selects pixel blocks to be labeled at a pixel level; and the label propagation module realizes migration from image to video tasks and completes the segmentation result of a video quickly to obtain weakly-supervised data. The present invention can rapidly generate weakly-supervised data sets, reduce the cost of manufacture of the data and optimize the performance of a semantic segmentation network.

BAYESIAN SEMANTIC SEGMENTATION ACTIVE LEARNING WITH BETA APPROXIMATION

Training of a machine vision model, a segmentation model, is performed by using an acquisition function for a small number of pixels of one or more training images. The acquisition function uses first mutual information and second mutual information to identify unlabelled pixels which are labelled with high uncertainty when predicting possible label values. Training, prediction of labels, identifying pixels with highly uncertain labels, obtaining labels only for those pixels with highly uncertain labels and retraining are performed iteratively to finally provide the machine vision model. The iterative approach uses very few labelled pixels to obtain the final machine vision model. The machine vision model accurately labels areas of a data image.

Semantic Image Fill at High Resolutions

Semantic fill techniques are described that support generating fill and editing images from semantic inputs. A user input, for example, is received by a semantic fill system that indicates a selection of a first region of a digital image and a corresponding semantic label. The user input is utilized by the semantic fill system to generate a guidance attention map of the digital image. The semantic fill system leverages the guidance attention map to generate a sparse attention map of a second region of the digital image. A semantic fill of pixels is generated for the first region based on the semantic label and the sparse attention map. The edited digital image is displayed in a user interface.

Compositional Action Machine Learning Mechanisms
20230360364 · 2023-11-09 ·

Mechanisms are provided for performing machine learning (ML) training of a ML action recognition computer model which involves processing an original input dataset to generate an object feature bank comprising object feature data structures for a plurality of different objects. For an input video, a verb data structure and an original object data structure are generated and a candidate object feature data structure is selected from the object feature bank for generation of pseudo composition (PC) training data. The PC training data is generated based on the selected candidate object feature data structure and comprises a combination of the verb data structure and the candidate object feature data structure. The PC training data represents a combination of an action and an object not represented in the original input dataset. ML training of the ML action recognition computer model is performed based on an unseen combination comprising the PC training data.