G06V40/164

Method and system for real-time and offline de-identification of facial regions from regular and occluded color video streams obtained during diagnostic medical procedures

Systems and techniques that facilitate real-time and/or offline de-identification of facial regions from regular and/or occluded color video streams obtained during diagnostic medical procedures are provided. A detection component can generate a bounding box substantially around a person in a frame of a video stream, can generate a heatmap showing key points or anatomical masks of the person based on the bounding box, and can localize a face or facial region of the person based on the key points or anatomical masks. An anonymization component can anonymize pixels in the frame that correspond to the face or facial region. A tracking component can track the face or facial region in a subsequent frame based on a structural similarity index between the frame and the subsequent frame being above a threshold. If the structural similarity index between the frame and the subsequent frame is above the threshold, the tracking component can track the face or facial region in the subsequent frame without having the detection component generate a bounding box or a heatmap in the subsequent frame, and the anonymization component can anonymize pixels in the subsequent frame corresponding to the tracked face or facial region.

MULTIMODAL METHOD FOR DETECTING VIDEO, MULTIMODAL VIDEO DETECTING SYSTEM AND NON-TRANSITORY COMPUTER READABLE MEDIUM
20230135866 · 2023-05-04 ·

A multimodal method for detecting video includes following step of: receiving a message to be detected to obtain a multimodal association result, which message to be detected corresponds to a video to be detected; generating a plurality of detecting conditions according to multimodal association result; searching a plurality of videos in a video detection database to obtain a target video in videos according to detecting conditions, which each of videos includes a plurality of video paragraphs respectively, which each of video paragraphs includes a piece of multimodal related data respectively; comparing detecting conditions and piece of multimodal related data of video paragraphs to obtain a matching video paragraph and using video corresponding to matching video paragraph as the target video; and outputting the target video and the video to be detected to a display device for display.

Image annotation using prior model sourcing

A method of image annotation includes selecting a plurality of annotation models related to an annotation task for an image, obtaining a candidate annotation map for the image from each of the plurality of annotation models, and selecting at least one of the candidate annotation maps to be displayed via a user interface, the candidate annotation maps comprising suggested annotations for the image. The method further includes receiving user selections or modifications of at least one of the suggested annotations from the candidate annotation map and generating a final annotation map based on the user selections or modifications.

Training Method of Facial Expression Embedding Model, Facial Expression Embedding Method and Facial Expression Embedding Device

The present disclosure provides a training method of a facial expression embedding model, a facial expression embedding method, and a facial expression embedding device. The method includes: determining a sample set, wherein each sample in the sample set includes three images and a sample label; and training the to-be-trained facial expression embedding model with the sample set, to obtain the trained facial expression embedding model, wherein the to-be-trained facial expression embedding model includes a to-be-trained full face embedding sub-model and a trained identity embedding sub-model, the trained facial expression embedding model includes a trained full face embedding sub-model and the trained identity embedding sub-model, and an output of the trained facial expression embedding model is determined by a difference between an output of the trained full face embedding sub-model and an output of the trained identity embedding sub-model.

DATA SET GENERATION AND AUGMENTATION FOR MACHINE LEARNING MODELS

A machine learning model (MLM) may be trained and evaluated. Attribute-based performance metrics may be analyzed to identify attributes for which the MLM is performing below a threshold when each are present in a sample. A generative neural network (GNN) may be used to generate samples including compositions of the attributes, and the samples may be used to augment the data used to train the MLM. This may be repeated until one or more criteria are satisfied. In various examples, a temporal sequence of data items, such as frames of a video, may be generated which may form samples of the data set. Sets of attribute values may be determined based on one or more temporal scenarios to be represented in the data set, and one or more GNNs may be used to generate the sequence to depict information corresponding to the attribute values.

Systems and methods for image feature recognition using a lensless camera
11804068 · 2023-10-31 · ·

Systems and methods are described for generating pixel image data, using a lensless camera, based on light that travels through a mask that with pattern masking the lensless camera. The system applies a transformation function to the pixel image data to generate frequency domain image data. The system inputs the frequency domain image data into a machine learning model, wherein the machine learning model does not have access to data that represents the pattern of the mask. The model is trained using a set of images with the feature that are captured by the flat, lensless camera through the mask. The system processes the frequency domain image data using the machine learning model to determine whether the pixel image data depicts the image feature. The system further performs an action based on determining that the pixel image data depicts the image feature.

Group Classifier Training Using Video Object Tracker
20230343095 · 2023-10-26 ·

Systems, methods, and data storage devices for improved classifier training using a video object tracker to determine video data samples are described. A group classifier may be trained using machine learning to classify image objects, based on a set of machine learning parameters, and assign them a group identifier. A retraining data set may be determined based on video data that was assigned that group identifier based on an object tracker. The group classifier may be retrained using the retraining data set to determine an updated set of machine learning parameters and the group classifier may be updated with those parameters.

Three-dimensional object reconstruction method and apparatus

A three-dimensional object reconstruction method, applied to a terminal device or a server, is provided. The method includes obtaining a plurality of video frames of an object; determining three-dimensional location information of key points of the object in the plurality of video frames and physical meaning information of the key points, the physical meaning information indicating respective positions of the object; determining a correspondence between the key points having the same physical meaning information in the plurality of video frames; and generating a three-dimensional object according to the correspondence and the three-dimensional location information of the key points.

Video call mediation method
11716424 · 2023-08-01 · ·

A video call mediation method of a system may include receiving, by a server, a mediation request, from a plurality of mobiles, mediating, by the server, a first mobile and a second mobile, of the plurality of mobiles, establishing, by the first mobile and the second mobile, a video call session, receiving, by the first mobile, a video, from the second mobile, through the video call session, detecting, by the first mobile, a certain input, reporting, by the first mobile, a video received in the server, in response to the certain input, ending, by the first mobile, the video call session with the second mobile, and establishing, by the first mobile, a video call session with a third mobile, and verifying, by the server, the reporting, and rejecting, by the server, additional mediation request of the second mobile.

Systems and methods for 3D facial modeling
11830141 · 2023-11-28 · ·

In an embodiment, a 3D facial modeling system includes a plurality of cameras configured to capture images from different viewpoints, a processor, and a memory containing a 3D facial modeling application and parameters defining a face detector, wherein the 3D facial modeling application directs the processor to obtain a plurality of images of a face captured from different viewpoints using the plurality of cameras, locate a face within each of the plurality of images using the face detector, wherein the face detector labels key feature points on the located face within each of the plurality of images, determine disparity between corresponding key feature points of located faces within the plurality of images, and generate a 3D model of the face using the depth of the key feature points.