G06V10/806

UPSAMPLING AND REFINING SEGMENTATION MASKS
20230132180 · 2023-04-27 ·

The present disclosure relates to systems, methods, and non-transitory computer-readable media that upsample and refine segmentation masks. Indeed, in one or more implementations, a segmentation mask refinement and upsampling system upsamples a preliminary segmentation mask utilizing a patch-based refinement process to generate a patch-based refined segmentation mask. The segmentation mask refinement and upsampling system then fuses the patch-based refined segmentation mask with an upsampled version of the preliminary segmentation mask. By fusing the patch-based refined segmentation mask with the upsampled preliminary segmentation mask, the segmentation mask refinement and upsampling system maintains a global perspective and helps avoid artifacts due to the local patch-based refinement process.

GRADING APPARATUS AND METHOD BASED ON DIGITAL DATA
20230127555 · 2023-04-27 · ·

A grading apparatus and a method based on digital data are provided. In the method, feature information of an image is obtained through a first model. Content of the image includes a real object, and the first model is trained based on a deep learning algorithm. A first inference result is determined according to a first feature in the feature information. The first feature is a region feature and is corresponding to objects, and the first inference result is one or more defects on the real object. A second inference result of a second feature in the feature information is determined through a second model based on a semantic algorithm. The second feature is related to locations, and the second inference result is related to context presented by the real object. The first and the second inference results are fused to obtain a grading result of the real object.

Multi-modal emotion recognition device, method, and storage medium using artificial intelligence
11475710 · 2022-10-18 · ·

A multi-modal emotion recognition system is disclosed. The system includes a data input unit for receiving video data and voice data of a user, a data pre-processing unit including a voice pre-processing unit for generating voice feature data from the voice data and a video pre-processing unit for generating one or more face feature data from the video data, a preliminary inference unit for generating situation determination data as to whether or not the user's situation changes according to a temporal sequence based on the video data. The system further comprises a main inference unit for generating at least one sub feature map based on the voice feature data or the face feature data, and inferring the user's emotion state based on the sub feature map and the situation determination data.

METHOD AND ELECTRONIC DEVICE FOR SEGMENTING OBJECTS IN SCENE

A method for segmenting objects in a scene by an electronic device is provided. The method includes inputting at least one input frame of the scene into a pre-trained neural network model, the scene including a plurality of objects; determining a position and a shape of each object of the plurality of objects in the scene using the pre-trained neural network model; determining an array of coefficients for pixels associated with each object of the plurality of objects in the scene using the pre-trained neural network model; and generating a segment mask for each object of the plurality of objects based on the position, the shape, and the array of coefficients for each object of the plurality of objects in the scene.

Method of obtaining mask frame data, computing device, and readable storage medium

The present disclosure describes techniques for generating a mask frame data segment corresponding to a video frame. The disclosed techniques include obtaining a frame of a video; identifying a main area of the frame using an image segmentation algorithm; and generating a mask frame data segment corresponding to the frame based on the main area of the frame, wherein the generating a mask frame data segment corresponding to the frame based on the main area of the frame further comprises generating the mask frame data segment based on a timestamp of the frame in the video, a width and a height of the main area of the frame.

Landing tracking control method and system based on lightweight twin network and unmanned aerial vehicle

A landing tracking control method comprises the following contents: a tracking model training stage and an unmanned aerial vehicle real-time tracking stage. The landing tracking control method extracts a network Snet by using a lightweight feature and makes modification, so that an extraction speed of the feature is increased to better meet a real-time requirement. Weight allocation on the importance of channel information is carried out to differentiate effective features more purposefully and utilize the features, so that the tracking precision is improved. In order to improve a training effect of the network, a loss function of an RPN network is optimized, a regression precision of a target frame is measured by using CIOU, and meanwhile, calculation of classified loss function is adjusted according to CIOU, and a relation between a regression network and classification network is enhanced.

Retail inventory shrinkage reduction via action recognition

This disclosure includes technologies for action recognition in general. The disclosed system may automatically detect various types of actions in a video, including reportable actions that cause shrinkage in a practical application for loss prevention in the retail industry. Further, appropriate responses may be invoked if a reportable action is recognized. In some embodiments, a three-branch architecture may be used in a machine learning model for action and/or activity recognition. The three-branch architecture may include a main branch for action recognition, an auxiliary branch for learning/identifying an actor (e.g., human parsing) related to an action, and an auxiliary branch for learning/identifying a scene related to an action. In this three-branch architecture, the knowledge of the actor and the scene may be integrated in two different levels for action and/or activity recognition.

System and method for media segment identification
11601713 · 2023-03-07 ·

A system and method for identifying media segments using audio augmented image cross-comparison is disclosed, in which a media segment identifying system analyses both audio and video content, producing a unique identifier to compare with previously identified media segments in a media segment database. The characteristic landmark-linked-image-comparisons are constructed by first identifying an audio landmark. The audio landmark is an audio peak that exceeds a predetermined threshold. Two digital images are then obtained, one associated directly with the audio landmark, and one obtained a predetermined landmark time removed from the first image. The two images are then used to provide a characteristic landmark-linked-image-comparison. The pair of images are reduced in pixel size and converted to gray scale. Corresponding pixels are compared to form a numeric comparison. One image is mirrored before comparison to reduce the possibility of null comparisons.

Video-based activity recognition
11636694 · 2023-04-25 · ·

Systems and techniques are provided for performing video-based activity recognition. For example, a process can include extracting, using a first machine learning model, first one or more features from a first frame and second one or more features from a second frame. The first one or more features and the second one or more features are associated with a person driving a vehicle. The process can include processing, using a second machine learning model, the first one or more features and the second one or more features. The process can include determining, based on processing of the first one or more features and the second one or more features using the second machine learning model, at least one activity associated with the person driving the vehicle.

IMAGE FEATURE COMBINATION FOR IMAGE-BASED OBJECT RECOGNITION
20230123624 · 2023-04-20 · ·

Methods, systems, and articles of manufacture to improve image recognition searching are disclosed. In some embodiments, a first document image of a known object is used to generate one or more other document images of the same object by applying one or more techniques for synthetically generating images. The synthetically generated images correspond to different variations in conditions under which a potential query image might be captured. Extracted features from an initial image of a known object and features extracted from the one or more synthetically generated images are stored, along with their locations, as part of a common model of the known object. In other embodiments, image recognition search effectiveness is improved by transforming the location of features of multiple images of a same known object into a common coordinate system. This can enhance the accuracy of certain aspects of existing image search/recognition techniques including, for example, geometric verification.