G06V10/806

IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD AND IMAGE PROCESSING SYSTEM
20210334580 · 2021-10-28 ·

An image processing device including a storage unit configured to store an object detection neural network trained using a first K-channel image generated from a first M-channel image and a first N-channel image generated from the first M-channel image, a reception unit configured to receive, from a sensor, a second M-channel image and a second N-channel image that include an identical subject, and an image analysis unit configured to generate, using the object detection neural network trained using the first K-channel image, object detection result information with respect to a second K-channel image generated from the second M-channel image and the second N-channel image, and output the object detection result information.

Systems, methods and computer program products for associating media content having different modalities
11157542 · 2021-10-26 · ·

Systems, methods, and computer program products for associating a media content clip(s) with other media content clip(s) having a different modality by determining first embedding vectors of media content items of a first modality, receiving a media content clip of a second modality, determining a second embedding vector of the media content clip of the second modality, ranking the first embedding vectors based on a distance between the embedding vectors and the second embedding vector, and selecting one or more of the media content items of the first modality based on the ranking, thereby pairing media content clips based on emotion.

User-Customizable Machine-Learning in Radar-Based Gesture Detection

Various embodiments dynamically learn user-customizable input gestures. A user can transition a radar-based gesture detection system into a gesture-learning mode. In turn, the radar-based gesture detection system emits a radar field configured to detect a gesture new to the radar-based gesture detection system. The radar-based gesture detection system receives incoming radio frequency (RF) signals generated by the outgoing RF signal reflecting off the gesture, and analyzes the incoming RF signals to learn one or more identifying characteristics about the gesture. Upon learning the identifying characteristics, the radar-based gesture detection system reconfigures a corresponding input identification system to detect the gesture when the one or more identifying characteristics are next identified, and transitions out of the gesture-learning mode.

Expression recognition method, apparatus, electronic device, and storage medium

Embodiments of the present disclosure provide an expression recognition method, apparatus, electronic device and storage medium. An expression recognition model includes a convolutional neural network model, a fully connected network model and a bilinear network model. During an expression recognition process, after an image to be recognized is pre-processed to obtain a facial image and a key point coordinate vector, the facial image is computed by the convolutional neural network model to output a first feature vector, the key point coordinate vector is computed by the fully connected network model to output a second feature vector, the first feature vector and the second feature vector are computed by the bilinear network model to obtain second-order information, and an expression recognition result in turn is obtained according to the second-order information. During this process, robustness of gestures and illuminations is better, and accuracy of expression recognition is improved.

IMAGE PROCESSING METHOD AND APPARATUS, AND STORAGE MEDIUM

The present disclosure relates to an image processing method and apparatus, and a storage medium. The method includes: performing step-by-step convolution processing on an image to be processed to obtain a convolution result (S11); obtaining a positioning result through positioning processing according to the convolution result (S12); performing step-by-step deconvolution processing on the positioning result to obtain a deconvolution result (S13); and performing segmentation processing on the deconvolution result to segment a target object from the image to be processed (S14). Embodiments of the present disclosure implement target object positioning and segmentation at the same time in a process of image processing, and the image processing precision is improved while the speed of image processing is guaranteed.

METHODS AND APPARATUS TO IMPLEMENT PARALLEL ARCHITECTURES FOR NEURAL NETWORK CLASSIFIERS
20210319319 · 2021-10-14 ·

Methods, apparatus, systems, and articles of manufacture are disclosed to implement parallel architectures for neural network classifiers. An example non-transitory computer readable medium comprises instructions that, when executed, cause a machine to at least: process a first stream using first neural network blocks, the first stream based on an input image; process a second stream using second neural network blocks, the second stream based on the input image; fuse a result of the first neural network blocks and the second neural network blocks; perform average pooling on the fused result; process a fully connected layer based on the result of the average pooling; and classify the image based on the output of the fully connected layer.

Keyframe Extractor
20210319230 · 2021-10-14 ·

In one aspect, an example method includes (i) determining a blur delta that quantifies a difference between a level of blurriness of a first frame of a video and a level of blurriness of a second frame of the video, wherein the second frame is subsequent to and adjacent to the first frame; (ii) determining a contrast delta that quantifies a difference between a contrast of the first frame and a contrast of the second frame; (iii) determining a fingerprint distance between a first image fingerprint of the first frame and a second image fingerprint of the second frame; (iv) determining a keyframe score using the blur delta, the contrast delta, and the fingerprint distance; (v) based on the keyframe score, determining that the second frame is a keyframe; and (vi) outputting data indicating that the second frame is a keyframe.

Transition Detector Neural Network
20210321150 · 2021-10-14 ·

In one aspect, an example method includes (i) extracting a sequence of audio features from a portion of a sequence of media content; (ii) extracting a sequence of video features from the portion of the sequence of media content; (iii) providing the sequence of audio features and the sequence of video features as an input to a transition detector neural network that is configured to classify whether or not a given input includes a transition between different content segments; (iv) obtaining from the transition detector neural network classification data corresponding to the input; (v) determining that the classification data is indicative of a transition between different content segments; and (vi) based on determining that the classification data is indicative of a transition between different content segments, outputting transition data indicating that the portion of the sequence of media content includes a transition between different content segments.

Method and system for processing a task with robustness to missing input information

A unit is disclosed for generating combined feature maps in accordance with a processing task to be performed, the unit comprising a feature map generating unit for receiving more than one modality and for generating more than one corresponding feature map using more than one corresponding transformation; wherein the generating of each of the more than one corresponding feature map is performed by applying a given corresponding transformation on a given corresponding modality, wherein the more than one corresponding transformation is generated following an initial training performed in accordance with the processing task to be performed and a combining unit for selecting and combining the corresponding more than one feature map generated by the feature map generating unit in accordance with at least one combining operation and for providing at least one corresponding combined feature map; wherein the combining unit is operating in accordance with the processing task to be performed and the combining operation reduces each corresponding numeric value of each of the more than one feature map generated by the feature map generation unit down to one numeric value in the at least one corresponding combined feature map.

Multisensor Data Fusion Method and Apparatus
20210311167 · 2021-10-07 ·

A multisensor data fusion perception method includes receiving feature data from a plurality of types of sensors, obtaining static feature data and dynamic feature data from the feature data, constructing current static environment information based on the static feature data and reference dynamic target information, and constructing current dynamic target information based on the dynamic feature data and reference static environment information such that construction of a dynamic target and construction of a static environment are performed by referring to each other's construction results and the perception capability is for the dynamic target and the static environment that are in an environment in which the moving carrier is located.