G06V10/806

KEY POINT DETECTION METHOD, MODEL TRAINING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM

There is provided a key point detection method, a model training method, an electronic device and a storage medium, which relates to the field of artificial intelligence, and particularly to computer vision technologies and deep learning technologies, and may be particularly used for scenarios, such as behavior recognition, human-body special effect generation, entertainment game interaction, or the like. The key point detection method includes: extracting features of an image to obtain image features of the image; acquiring graph information of key points of a target in the image based on the image features, the graph information including a location relationship graph of the key points and location information of a central point in the key points; and acquiring location information of non-central points in the key points based on the location relationship graph of the key points and the location information of the central point.

METHOD, APPARATUS, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM FOR CONFIRMING A PERCEIVED POSITION OF A TRAFFIC LIGHT

A method, apparatus, and computer-readable medium for confirming a perceived position of a traffic light, by obtaining identifiers and results of a first perception of traffic lights associated with the identifiers, the results of the first perception including a first estimation of an ellipse encompassing each of the traffic lights, receiving results of a second perception of traffic lights associated with the identifiers, the results of the second perception including a second estimation of an ellipse encompassing each of the traffic lights, calculating, based on the first perception and the second perception, association parameters for each possible pair of estimated ellipses, selecting, based on the calculated association parameters for each possible pair of estimated ellipses, matching pairs of estimated ellipses, and fusing each matching pair of estimated ellipses.

MULTIMODAL SENTIMENT CLASSIFICATION

Sentiment classification can be implemented by an entity-level multimodal sentiment classification neural network. The neural network can include left, right, and target entity subnetworks. The neural network can further include an image network that generates representation data that is combined and weighted with data output by the left, right, and target entity subnetworks to output a sentiment classification for an entity included in a network post.

METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM FOR TRAINING VIDEO RECOGNITION MODEL
20230069197 · 2023-03-02 ·

A method and an apparatus for training a video recognition model are provided. The method may include: dividing a sample video into a plurality of sample video segments; sampling a part of sample video frames from a sample video segment; inputting the part of sample video frames into a feature extraction network to obtain feature information of the sample video segment; performing convolution fusion on the feature information by using a dynamic segment fusion module to obtain fusion feature information, where a convolution kernel of the dynamic segment fusion module varies with different video inputs; inputting the fusion feature information to a fully connected layer to obtain an estimated category of the sample video; and performing a parameter adjustment based on a difference between the tag of a true category and the estimated category to obtain the video recognition model.

Action Recognition Method, Apparatus and Device, Storage Medium and Computer Program Product

The present subject matter discloses an action recognition method, apparatus and device, a storage medium, and a computer program product, belonging to the field of image recognition. Multiple video frames in a target video are obtained. Feature extraction is performed on the multiple video frames respectively according to multiple dimensions to obtain multiple multi-channel feature patterns. Each video frame corresponds to one multi-channel feature pattern. Each channel represents one dimension. An attention weight of each multi-channel feature pattern is determined based on a similarity between every two multi-channel feature patterns. The attention weight is used for representing a degree of correlation between a corresponding multi-channel feature pattern and an action performed by an object in the target video. A type of the action is determined based on the multiple multi-channel feature patterns and the determined multiple attention weights.

VIDEO PROCESSING METHOD AND APPARATUS

A video clip location technology in the field of computer vision pertaining to artificial intelligence that provides a video processing method and apparatus. The method includes: obtaining a semantic feature of an input sentence; performing semantic enhancement on a video frame based on the semantic feature to obtain a video feature of the video frame, where the video feature includes the semantic feature; and determining, based on the semantic feature and the video feature, whether a video clip to which the video frame belongs is a target video clip corresponding to the input sentence. The method helps improve accuracy of recognizing a target video clip corresponding to an input sentence.

SENSOR FUSION APPROACH FOR PLASTICS IDENTIFICATION

Methods and systems for using multiple hyperspectral cameras sensitive to different wavelengths to predict characteristics of objects for further processing, including recycling, are described. The multiple hyperspectral images can be used to predict higher resolution spectra by using a trained machine learning model. The higher resolution spectra may be more easily analyzed to sort plastics into a recyclability category. The hyperspectral images may also be used to identify and analyze dark or black plastics, which are challenging for SWIR, MWIR, and other wavelengths. The machine learning model may also predict the base polymers and contaminants of plastic objects for recycling. The hyperspectral images may be used to predict recyclability and other characteristics using a trained machine learning model.

METHOD OF RECOGNIZING OBJECT, ELECTRONIC DEVICE AND STORAGE MEDIUM
20220327803 · 2022-10-13 ·

A method of recognizing an object, an electronic device and storage medium are provided, which relate to a field of data processing, in particular to a field of object recognition. The method includes: acquiring a position information and an image data of an object to be detected; performing a feature extraction on the position information and the image data of the object to be detected to obtain a first target concatenating feature; inputting the first target concatenating feature into a pre-trained deep learning model to obtain a second target concatenating feature; determining a second sample concatenating feature matched with the second target concatenating feature by matching the second target concatenating feature with each second sample concatenating feature obtained by processing a first sample concatenating feature of a sample object; and determining the object to be detected as the sample object corresponding to the second sample concatenating feature.

Change-aware person identification

A method for training a model, the method including: defining a primary model for identifying a class of input data based on a first characteristic of the input data; defining a secondary model for detecting a change to a second characteristic between multiple input data captured at different times; defining a forward link from an output of an intermediate layer of the secondary model to an input of an intermediate layer of the primary model; and training the primary model and the secondary model in parallel based on a training set of input data.

Infrared And Color-Enhanced Partial Image Blending

First objects are detected in an infrared image from an infrared camera and second objects are detected in a color image from a color camera. The first objects are compared with the second objects to determine pairs of matching first objects and second objects. For each matching pair, a respective region of an output image is colorized by setting colors of pixels in the region based on colors of the pixels of the second object in the matching pair. The pixels in the region have locations corresponding to the locations of the pixels in the first object of the matching pair. When the colorizing is complete, pixels not in the colorized regions have intensities of the infrared image. The output image is a version of the infrared image with a region colorized according to the color image.