G06N3/0455

SYSTEM AND METHOD FOR UNSUPERVISED LEARNING OF SEGMENTATION TASKS
20230050573 · 2023-02-16 ·

Apparatuses and methods are provided for training a feature extraction model determining a loss function for use in unsupervised image segmentation. A method includes determining a clustering loss from an image; determining a weakly supervised contrastive loss of the image using cluster pseudo labels based on the clustering loss; and determining the loss function based on the clustering loss and the weakly supervised contrastive loss.

HUMAN-OBJECT INTERACTION DETECTION

A human-object interaction detection method, a neural network and a training method therefor is provided. The human-object interaction detection method includes: extracting a plurality of first target features and one or more first motion features from an image feature of an image to be detected; fusing each first target feature and some of the first motion features to obtain enhanced first target features; fusing each first motion feature and some of the first target features to obtain enhanced first motion features; processing the enhanced first target features to obtain target information of a plurality of targets including human targets and object targets; processing the enhanced first motion features to obtain motion information of one or more motions, where each motion is associated with one human target and one object target; and matching the plurality of targets with the one or more motions to obtain a human-object interaction detection result.

METHOD FOR TRAINING NON-AUTOREGRESSIVE TRANSLATION MODEL
20230051373 · 2023-02-16 ·

A method for training a non-autoregressive translation (NAT) model includes: acquiring a source language text, a target language text corresponding to the source language text and a target length of the target language text; generating a target language prediction text and a prediction length by inputting the source language text into the NAT model, in which initialization parameters of the NAT model are determined based on parameters of a pre-trained translation model; and obtaining a target NAT model by training the NAT model based on the target language text, the target language prediction text, the target length and the prediction length.

BLOOD FLOW FIELD ESTIMATION APPARATUS, LEARNING APPARATUS, BLOOD FLOW FIELD ESTIMATION METHOD, AND PROGRAM

A blood flow field estimation apparatus is provided, including an estimation unit that uses a learned model obtained in advance by performing machine learning to learn a relationship between organ tissue three-dimensional structure data including image data of a plurality of organ cross-sectional images serving as cross-sectional images of an organ and having each pixel provided with two or more bit depths and image position information serving as information indicating a position of an image reflected on each of the organ cross-sectional images in the organ, and a blood flow field in the organ, and estimates the blood flow field in the organ of an estimation target, based on the organ tissue three-dimensional structure data of the organ of the estimation target, and an output unit that outputs an estimation result of the estimation unit.

METHOD OF FUSING IMAGE, AND METHOD OF TRAINING IMAGE FUSION MODEL

A method of fusing an image, a method of training an image fusion model, an electronic device, and a storage medium. The method of fusing the image includes: encoding a stitched image obtained by stitching a foreground image and a background image, so as to obtain a feature map; and decoding the feature map to obtain a fused image, wherein the feature map is decoded by: performing a weighting on the feature map by using an attention mechanism, so as to obtain a weighted feature map; performing a fusion on the feature map according to feature statistical data of the weighted feature map, so as to obtain a fused feature; and decoding the fused feature to obtain the fused image.

HUMAN-OBJECT INTERACTION DETECTION

A human-object interaction detection method, a neural network and a training method therefor is provided. The human-object interaction detection method includes: performing first target feature extraction on an image feature of an image; performing first interaction feature extraction on the image feature; processing a plurality of first target features to obtain target information of a plurality of detected targets; processing one or more first interaction features to obtain motion information of a motion, human information of a human target corresponding to each motion, and object information of an object target corresponding to each motion; matching the plurality of detected targets with one or more motions; and updating human information of a corresponding human target based on target information of a detected target matching the corresponding human target, and updating object information of a corresponding object target based on target information of a detected target matching the corresponding object target.

HUMAN-OBJECT INTERACTION DETECTION

A human-object interaction detection method, a neural network and a training method therefor is provided. The human-object interaction detection method includes: performing first target feature extraction on image features of an image to obtain first target features; performing first interaction feature extraction on image features to obtain first interaction features and scores thereof; determining at least some first interaction features in the first interaction features based on the score of each of the first interaction features; determining first motion features based on the at least some first interaction features and the image features; processing the first target features to obtain target information of targets in the image; processing the first motion features to obtain motion information of one or more motions in the image; and matching the targets with the motions to obtain a human-object interaction detection result.

MACHINE LEARNING MODELS FOR DETECTING TOPIC DIVERGENT DIGITAL VIDEOS
20230046248 · 2023-02-16 ·

The present disclosure relates to systems, methods, and non-transitory computer readable media for accurately and flexibly generating topic divergence classifications for digital videos based on words from the digital videos and further based on a digital text corpus representing a target topic. Particularly, the disclosed systems utilize a topic-specific knowledge encoder neural network to generate a topic divergence classification for a digital video to indicate whether or not the digital video diverges from a target topic. In some embodiments, the disclosed systems determine topic divergence classifications contemporaneously in real time for livestream digital videos or for stored digital videos (e.g., digital video tutorials). For instance, to generate a topic divergence classification, the disclosed systems generate and compare contextualized feature vectors from digital videos with corpus embeddings from a digital text corpus representing a target topic utilizing a topic-specific knowledge encoder neural network.

DATA AUGMENTATION USING MACHINE TRANSLATION CAPABILITIES OF LANGUAGE MODELS

Disclosed are embodiments for improving training data for machine learning (ML) models. In an embodiment, a method is disclosed where an augmentation engine receives a seed example, the seed example stored in a seed training data set; generates an encoded seed example of the seed example using an encoder; inputs the encoded seed example into a machine learning model and receives a candidate example generated by the machine learning model; determines that the candidate example is similar to the encoded seed example; and augments the seed training data set with the candidate example.

TREND-INFORMED DEMAND FORECASTING

In an approach to jointly learning uncertainty-aware trend-informed neural network for a demand forecasting model, a machine learning model is trained to capture uncertainty in input forecasts. The uncertainty in a latent space is represented using an auto-encoder based neural architecture. The uncertainty-aware latent space is modeled and optimized to generate an embedding space. A time-series regressor model is learned from the embedding space. A machine learning model is trained for trend-aware demand forecasting based on said time-series regressor model.