Patent classifications
G06V10/806
DETERMINING WATCH TIME LOSS REGIONS IN MEDIA CONTENT ITEMS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining watch time loss regions in media content items. In one aspect, features for a video are input into a trained model that is trained to output watch time loss regions. The trained model is trained using labels corresponding to known watch time loss regions in training videos and features of training videos that correspond to the known watch time loss regions. A watch time loss region defines a time window of a video during which a likelihood of a user stopping playback of the video is more than a threshold likelihood. In response to inputting the feature for the first video into the trained model, data regarding watch time loss regions for the video is obtained from the model and provided to an entity involved in providing the video to a user.
IMAGE SEMANTIC SEGMENTATION ALGORITHM AND SYSTEM BASED ON MULTI-CHANNEL DEEP WEIGHTED AGGREGATION
An image semantic segmentation algorithm and system based on multi-channel deep weighted aggregation where the image semantic segmentation algorithm is based on multi-channel deep weighted aggregation. The aggregation includes semantic features with definite class information in an image, transition semantic features between low-level semantic and high-level semantic, and semantic features of context logic relationship in an image are extracted by a low-level semantic channel, an auxiliary semantic channel and a high-level semantic channel, respectively. The aggregation further includes three different semantic features obtained in S1 are fused by weighted aggregation to obtain global semantic information of the image; S3, the semantic features output from respective semantic channels in S1 and the global semantic information in S2 are used to compute loss function for training.
METHOD FOR FEATURE DETECTION OF COMPLEX DEFECTS BASED ON MULTIMODAL DATA
The present disclosure disclose a method for feature detection of complex defects based on multimodal data, including feature extraction of multimodal data, multimodal feature cross-guided learning, multimodal feature fusion, and defect classification and regression. Feature extraction networks for multimodal two-dimensional data are constructed first, and a defect data set is sent to the networks for training; during training, cross-guided learning is implemented by using a multimodal feature cross-guidance network; then feature fusion is performed by using a weight adaptive method; and finally a defect detection task is implemented by using a classification subnetwork and a regression subnetwork. In the present disclosure, fusion of the multimodal data in a process of feature detection of the complex defects can be implemented efficiently, a capability of detecting the complex defects in an industrial environment can be improved more effectively, and production efficiency in an industrial manufacturing process is ensured.
IMAGE PROCESSING METHOD, APPARATUS AND DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
The embodiments of the present application discloses an image processing method, an image processing apparatus, an image processing device and a computer-readable storage medium. The method comprises: obtaining a plurality of view: images captured by a plurality of acquisition apparatuses at different views on a vehicle and an aerial view feature at a previous moment of a current moment; extracting temporal information from the aerial view feature at the previous moment according: to a preset aerial view query vector, extracting spatial information from a plurality of view image features corresponding to the plurality of view images, and combining the temporal information and the spatial information to generate an aerial view feature at the current moment, wherein the preset aerial view query vector corresponds to a three-dimensional physical world which is a preset range away from the vehicle in a real scene at the current moment
FACIAL EXPRESSION RECOGNITION METHOD AND SYSTEM COMBINED WITH ATTENTION MECHANISM
Provided are a facial expression recognition method and system combined with an attention mechanism. The method comprises: detecting faces comprised in each video frame in a video sequence, and extracting corresponding facial ROIs, so as to obtain facial pictures in each video frame; aligning the facial pictures in each video frame on the basis of location information of facial feature points of the facial pictures; inputting the aligned facial pictures into a residual neural network, and extracting spatial features of facial expressions corresponding to the facial pictures; inputting the spatial features of the facial expressions into a hybrid attention module to acquire fused features of the facial expressions; inputting the fused features of the facial expressions into a gated recurrent unit, and extracting temporal features of the facial expressions; and inputting the temporal features of the facial expressions into a fully connected layer, and classifying and recognizing the facial expressions.
Appearance Analysis Method and Electronic Device
An appearance analysis method includes an electronic device obtaining a first image associated with a first region of an object and a second image associated with a second region of the object, where the first image is collected by a first camera, where the second image is collected by a second camera, and wherein the first region is different from the second region. The electronic device provides an appearance evaluation of the object, where the appearance evaluation is determined based on the first image and the second image.
LIVENESS DETECTION METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM
A liveness detection method includes: obtaining a reflected audio signal and video data of a object in response to receiving a liveness detection request; performing signal processing and time-frequency analysis on the reflected audio signal to obtain time-frequency information of a processed audio signal, and extracting motion trajectory information of the object from the video data; respectively extract features from the time-frequency information and the motion trajectory information to obtain an audio feature and a motion feature of the object; calculating first global attention information of the object according to the audio feature, and calculating second global attention information of the object according to the motion feature; and fusing the first global attention information with the second global attention information to obtain fused global information, and determining a liveness detection result of the object based on the fused global information.
AUDIOVISUAL SECONDARY HAPTIC SIGNAL RECONSTRUCTION METHOD BASED ON CLOUD-EDGE COLLABORATION
An audio visual haptic signal reconstruction method includes first utilizing a large-scale audio-visual database stored in a central cloud to learn knowledge, and transferring same to an edge node; then combining, by means of the edge node, a received audio-visual signal with knowledge in the central cloud, and fully mining semantic correlation and consistency between modals; and finally fusing the semantic features of the obtained audio and video signals and inputting the semantic features to a haptic generation network, thereby realizing the reconstruction of the haptic signal. The method effectively solves the problems that the number of audio and video signals of a multi-modal dataset is insufficient, and semantic tags cannot be added to all the audio-visual signals in a training dataset by means of manual annotation. Also, the semantic association between heterogeneous data of different modals are better mined, and the heterogeneity gap between modals are eliminated.
Video generation method and system for high resolution face swapping
A video generation method includes: obtaining a target face image and a source face image; extracting a feature of each of the source face image and the target face image through a face feature encoder, to obtain corresponding source feature codes and target feature codes; generating swapped face feature codes through a face feature exchanger according to the source feature codes and the target feature codes; generating an initial swapped face image through a face generator according to the swapped face feature codes; and fusing the initial swapped face image with the target face image through a face fuser, to obtain a final swapped face image. The face feature encoder performs hierarchical encoding on the face feature to reserve semantic details of a face, and the face feature exchanger performs further processing based on the hierarchical encoding, to obtain hierarchical encoding of a swapped face feature with semantic details.
JOINT PERCEPTION MODEL TRAINING METHOD, JOINT PERCEPTION METHOD, DEVICE, AND STORAGE MEDIUM
Provided are a joint perception model training method, a joint perception method, a device, and a storage medium. The joint perception model training method includes: acquiring sample images and perception tags of the sample images; acquiring a preset joint perception model, where the joint perception model includes a feature extraction network and a joint perception network; performing feature extraction on the sample images through the feature extraction network to obtain target sample features; performing joint perception through the joint perception network according to the target sample features to obtain perception prediction results; and training the preset joint perception model according to the perception prediction results and the perception tags, where the joint perception includes executing at least two perception tasks.