G06V10/806

Landslide recognition method based on laplacian pyramid remote sensing image fusion

A landslide recognition method based on Laplacian pyramid remote sensing image fusion includes: performing original remote sensing image reconstruction based on extracted local features and global features of remote sensing images through a Laplacian pyramid fusion module to generate a fused image, constructing a deep learning semantic segmentation model through a semantic segmentation network, labeling the fused image to obtain a dataset of landslide disaster label map, and training the deep learning semantic segmentation model by the dataset, and then storing when a loss curve is fitted and a landslide recognition accuracy of remote sensing image of the deep learning semantics segmentation model meets a requirement by modifying a structure of the semantic segmentation network and adjusting parameters of the deep learning semantics segmentation model. Combined with the image fusion model based on Laplacian pyramid, the method can provide effective decision-making basis for prevention and mitigation of landslide disasters.

System and Method for Identity Preservative Representation of Persons and Objects Using Spatial and Appearance Attributes

A method is described, for processing images of persons or objects to generate an identity preservative feature descriptor learnt for each person or object. The method includes obtaining an image of a person or object, extracting at least one spatial attribute of the person or object from the obtained image, and extracting at least one appearance feature of the person or object from the image by using a mapping function to translate image pixels into appearance attributes represented by at least one numerical feature. The method also includes combining the at least one spatial attribute and the at least one appearance feature to generate the unique feature descriptor representing the person or the object, to assign the unique feature descriptor to the image to enable feature descriptors representing the same person or object to be compared to feature descriptors representing different people or objects given a predefined mathematical pseudo-distance metric according to a least a distance from each other.

IMAGE PROCESSING METHOD, MODEL TRAINING METHOD, RELEVANT DEVICES AND ELECTRONIC DEVICE
20220383626 · 2022-12-01 ·

An image processing method includes: obtaining a first categorical feature and M first image features corresponding to M first images respectively, each first image being associated with a task index, task indices associated with different first images being different from each other, M being a positive integer; fusing the M first image features with the first categorical feature respectively so as to obtain M first target features; performing feature extraction on the M first target features so as to obtain M second categorical features; selecting a second categorical feature corresponding to each task index from the M second categorical features, and performing regularization corresponding to the task index on the second categorical feature, to obtain a third categorical feature corresponding to the task index; and performing image processing in accordance with M third categorical features so as to obtain M first image processing results of the M first images.

Method and apparatus for processing image

Embodiments of the present disclosure disclose a method and apparatus for processing an image. A specific embodiment of the method includes: acquiring a feature map of a target image, where the target image contains a target object; determining a local feature map of a target size in the feature map; combining features of different channels in the local feature map to obtain a local texture feature map; and obtaining location information of the target object based on the local texture feature map.

Action prediction

According to one aspect, action prediction may be implemented via a spatio-temporal feature pyramid graph convolutional network (ST-FP-GCN) including a first pyramid layer, a second pyramid layer, a third pyramid layer, etc. The first pyramid layer may include a first graph convolution network (GCN), a fusion gate, and a first long-short-term-memory (LSTM) gate. The second pyramid layer may include a first convolution operator, a first summation operator, a first mask pool operator, a second GCN, a first upsampling operator, and a second LSTM gate. An output summation operator may sum a first LSTM output and a second LSTM output to generate an output indicative of an action prediction for an inputted image sequence and an inputted pose sequence.

Scene recognition method, training method and device based on pyramid attention

The present invention discloses a scene recognition method, a training method and a device based on pyramid attention, belonging to the field of computer vision. The method includes: pyramid layering a color feature map and a depth feature map respectively, calculating the corresponding attention map of each layer; taking the output of the attention of the last layer as the output; taking the attention output of the last layer as the final feature map, for the remaining layers, adding the result after upsampling of the final feature map of an upper layer to the attention output of this layer as the final feature map of this layer; scaling the attention map and the final feature map, using the average of two new attention maps as the final attention map, mapping the largest k position in the final attention map to the final feature map of this layer.

DETECTION RESULT ANALYSIS DEVICE, DETECTION RESULT ANALYSIS METHOD, AND COMPUTER READABLE MEDIUM

An evaluation value calculation unit (22) focus on, as a target layer, each of a plurality of layers in an object detection model which detects a target object included in image data and which is constituted using a neural network, and calculates an evaluation value of the target layer from a heat map representing an activeness degree per pixel in the image data obtained from an output result of the target layer, and from a detection region where the target object is detected. A layer selection unit (23) selects at least some layers out of the plurality of layers on a basis of the evaluation value.

METHOD OF PERFORMING OBJECT SEGMENTATION ON VIDEO USING SEMANTIC SEGMENTATION MODEL, DEVICE AND STORAGE MEDIUM

A method of performing an object segmentation on a video using a semantic segmentation model, a device, and a storage medium, which relate to a field of artificial intelligence, in particular to computer vision and deep learning technologies. The method includes: sequentially inputting a current video frame and a previous video frame into a first feature extraction network to obtain a feature map sequence; sequentially inputting object segmentation information of the previous video frame into a second feature extraction network to obtain a segmentation feature sequence; sequentially inputting the current video frame and the previous video frame into a temporal encoding network to obtain a temporal feature sequence; generating a fused feature sequence based on the feature map sequence, the segmentation feature sequence and the temporal feature sequence; and inputting the fused feature sequence into a segmentation network to obtain an object segmentation information of the current video frame.

DEVICE AND METHOD FOR GENERATING SPEECH VIDEO ALONG WITH LANDMARK
20220375224 · 2022-11-24 ·

A speech video generation device according to an embodiment includes a first encoder, which receives an input of a person background image that is a video part in a speech video of a predetermined person, and extracts an image feature vector from the person background image, a second encoder, which receives an input of a speech audio signal that is an audio part in the speech video, and extracts a voice feature vector from the speech audio signal, a combining unit, which generates a combined vector by combining the image feature vector output from the first encoder and the voice feature vector output from the second encoder, a first decoder, which reconstructs the speech video of the person using the combined vector as an input, and a second decoder, which predicts a landmark of the speech video using the combined vector as an input.

IMAGE RECOGNITION METHOD AND APPARATUS, TRAINING METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM
20220375207 · 2022-11-24 ·

An image recognition method and apparatus, a training method, an electronic device, and a storage medium are provided. The image recognition method includes: acquiring an image to be recognized, the image to be recognized including a target text; and determining text content of the target text based on knowledge information and image information of the image to be recognized.