G06V10/7753

Joint temporal segmentation and classification of user activities in egocentric videos

Embodiments disclose methods, systems and non-transitory computer readable medium for joint temporal segmentation and classification of user activities in an egocentric video. The method includes extracting low-level features from a live dataset based on predefined feature categories; determining at least one activity change frame from the egocentric video frames based on the extracted features; dividing the live dataset into partitions based on the activity change frame, each partition begins with a candidate frame; computing a recursive cost function at the candidate frame of each partition based on dynamic programming; determining a beginning time instant of the candidate frame based on the computation; segmenting the live dataset into multiple segments based on the determined time instant; identifying at least one activity segment that corresponds a user activity using a trained activity model based on multiple instance learning approach; and simultaneously associating a predefined activity label with the identified activity segment.

DEEP LEARNING BASED INSTANCE SEGMENTATION VIA MULTIPLE REGRESSION LAYERS
20220366564 · 2022-11-17 · ·

Novel tools and techniques are provided for implementing digital microscopy imaging using deep learning-based segmentation and/or implementing instance segmentation based on partial annotations. In various embodiments, a computing system might receive first and second images, the first image comprising a field of view of a biological sample, while the second image comprises labeling of objects of interest in the biological sample. The computing system might encode, using an encoder, the second image to generate third and fourth encoded images (different from each other) that comprise proximity scores or maps. The computing system might train an AI system to predict objects of interest based at least in part on the third and fourth encoded images. The computing system might generate (using regression) and decode (using a decoder) two or more images based on a new image of a biological sample to predict labeling of objects in the new image.

Integrated machine learning audiovisual application for a defined subject

Disclosed herein are system, method, and computer program product embodiments for utilizing a feedback loop to continuously improve an artificial intelligence (AI) engine's determination of predictive features associated with a topic. An embodiment operates by training an AI engine for a topic using data from a data source, wherein the topic is associated with a geolocation. The embodiments first receives a set of predictive features for the topic from the trained AI engine. The embodiment transmits the set of predictive features for the topic to a set of electronic devices. The embodiment second receives a set of audiovisual content captured by the set of electronic devices. The set of electronic devices capture the set of audiovisual content based on the set of predictive features for the topic. The embodiment finally retrains the AI engine based on the first set of audiovisual content.

Proposal learning for semi-supervised object detection

A method for generating a neural network for detecting one or more objects in images includes generating one or more self-supervised proposal learning losses based on the one or more proposal features and corresponding proposal feature predictions. One or more consistency-based proposal learning losses are generated based on noisy proposal feature predictions and the corresponding proposal predictions without noise. A combined loss is generated using the one or more self-supervised proposal learning losses and one or more consistency-based proposal learning losses. The neural network is updated based on the combined loss.

ELECTRONIC DEVICE AND CONTROLLING METHOD THEREOF

An electronic device and a controlling method thereof are provided. A controlling method of an electronic device according to the disclosure includes: performing first learning for a neural network model for acquiring a video sequence including a talking head of a random user based on a plurality of learning video sequences including talking heads of a plurality of users, performing second learning for fine-tuning the neural network model based on at least one image including a talking head of a first user different from the plurality of users and first landmark information included in the at least one image, and acquiring a first video sequence including the talking head of the first user based on the at least one image and pre-stored second landmark information using the neural network model for which the first learning and the second learning were performed.

OBJECT DETECTOR TRAINED VIA SELF-SUPERVISED TRAINING ON RAW AND UNLABELED VIDEOS
20230169344 · 2023-06-01 ·

An example system includes a processor to receive an image containing an object to be detected. The processor is to detect the object in the image via a binary object detector trained via a self-supervised training on raw and unlabeled videos.

JOINT OBJECT AND OBJECT PART DETECTION USING WEB SUPERVISION

A method for generating object and part detectors includes accessing a collection of training images. The collection of training images includes images annotated with an object label and images annotated with a respective part label for each of a plurality of parts of the object. Joint appearance-geometric embeddings for regions of a set of the training images are generated. At least one detector for the object and its parts is learnt using annotations of the training images and respective joint appearance-geometric embeddings, e.g., using multi-instance learning for generating parameters of scoring functions which are used to identify high scoring regions for learning the object and its parts. The detectors may be output or used to label regions of a new image with object and part labels.

Method and System for Reducing False Positives in Object Detection Neural Networks Caused by Novel Objects

System and method for reducing false negatives in object detection, including: extracting an object of interest from a respective image in a first set of training data that includes in distribution (ID) data. For each of the object of interest extracted from the respective image in the first set of training data: fusing the object of interest with an image from a second set of data that does not include any objects of interest to form a fused image; adding the fused image to the training data; and using the training data to train a detection model for object detection.

User operational space context map-actuated risk prediction and reduction cognitive suit

Generating a risk and constraint labeled context map of an operational space is provided. The risk and constraint labeled context map of the operational space corresponding to a user of a cognitive suit is generated to drive the cognitive suit contextually using three-dimension reconstruction, virtual reality, and semi-supervised learning. Labeled risks and constraints in the risk and constraint labeled context map are associated with cognitive suit actuation events to deploy a set of mitigation strategies to address the labeled risks and constraints. An apparatus embedded in the cognitive suit is actuated to deploy the set of mitigation strategies in response to sensing a labeled risk or labeled constraint proximate to the user along a trajectory of the user in the operational space.

Dynamic lighting states based on context
11265994 · 2022-03-01 · ·

During operation, a computer obtains information specifying a lighting configuration of one or more lights in an environment, where the lighting configuration includes the one or more lights at predefined or predetermined locations in the environment. Then, the computer receives sensor data associated with the environment. Moreover, the computer analyzes the sensor data to determine a context associated with the environment. Then, based at least in part on the lighting configuration, a layout of the environment, and the determined context, the computer automatically determines the dynamic lighting states of the one or more lights, where a dynamic lighting state of a given light includes an intensity and a color of the given light. Next, the computer provides instructions corresponding to the dynamic lighting states to the one or more lights. Note that the dynamic lighting states may be based at least in part on a transferrable profile of the individual.