G06V10/806

HUMAN-OBJECT INTERACTION DETECTION

A human-object interaction detection method, a neural network and a training method therefor is provided. The human-object interaction detection method includes: extracting a plurality of first target features and one or more first motion features from an image feature of an image to be detected; fusing each first target feature and some of the first motion features to obtain enhanced first target features; fusing each first motion feature and some of the first target features to obtain enhanced first motion features; processing the enhanced first target features to obtain target information of a plurality of targets including human targets and object targets; processing the enhanced first motion features to obtain motion information of one or more motions, where each motion is associated with one human target and one object target; and matching the plurality of targets with the one or more motions to obtain a human-object interaction detection result.

HUMAN-OBJECT INTERACTION DETECTION

A human-object interaction detection method, a neural network and a training method therefor is provided. The human-object interaction detection method includes: performing first target feature extraction on image features of an image to obtain first target features; performing first interaction feature extraction on image features to obtain first interaction features and scores thereof; determining at least some first interaction features in the first interaction features based on the score of each of the first interaction features; determining first motion features based on the at least some first interaction features and the image features; processing the first target features to obtain target information of targets in the image; processing the first motion features to obtain motion information of one or more motions in the image; and matching the targets with the motions to obtain a human-object interaction detection result.

METHOD FOR TRAINING STUDENT NETWORK AND METHOD FOR RECOGNIZING IMAGE

Disclosed are a method for training a Student Network and a method for recognizing an image. The method includes: acquiring first prediction feature information of a sample image on the first granularity and second prediction feature information of the sample image on the second granularity by inputting the sample image into a Student Network, and acquiring first feature information of the sample image on the first granularity and second feature information of the sample image on the second granularity by inputting the sample image into a Teacher Network, and acquiring a target Student Network.

SENSOR TRANSFORMATION ATTENTION NETWORK (STAN) MODEL

A sensor transformation attention network (STAN) model including sensors configured to collect input signals, attention modules configured to calculate attention scores of feature vectors corresponding to the input signals, a merge module configured to calculate attention values of the attention scores, and generate a merged transformation vector based on the attention values and the feature vectors, and a task-specific module configured to classify the merged transformation vector is provided.

Target Detection Method and Apparatus
20230045519 · 2023-02-09 ·

A target detection method and apparatus. The method comprises: acquiring an input image, and sending same to a candidate region generation network to generate a plurality of regions of interest; formatting the plurality of regions of interest, and then sending same to a target key point network to generate a thermodynamic diagram; using a global feature map of the input image to perform convolution on the thermodynamic diagram, so as to generate a local depth feature map; and fusing the global feature map and the local depth feature map, and detecting a target therefrom by means of a detector. The present invention can be applied to target detection at different scales, improves the detection accuracy and robustness of a target detection technique for an occluded target in complex scenarios, and achieves, by means of making full use of local key point information of the target, target positioning under occlusion.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING PROGRAM, AND INFORMATION PROCESSING METHOD

A processing load in a case where a plurality of different sensors is used can be reduced. An information processing apparatus according to an embodiment includes: a recognition processing unit (15, 40b) configured to perform recognition processing for recognizing a target object by adding, to an output of a first sensor (23), region information that is generated according to object likelihood detected in a process of object recognition processing based on an output of a second sensor (21) different from the first sensor.

CONDITIONAL IMAGE GENERATION USING ONE OR MORE NEURAL NETWORKS
20230045076 · 2023-02-09 ·

Apparatuses, systems, and techniques are presented to generate one or more images. In at least one embodiment, one or more neural networks are used to generate one or more images based, at least in part, upon one or more input types.

Systems and Methods for Image Based Perception

Systems and methods for image-based perception. The methods comprise: obtaining, by a computing device, images captured by a plurality of cameras with overlapping fields of view; generating, by the computing device, spatial feature maps indicating locations of features in the images; defining, by the computing device, predicted cuboids at each location of an object in the images based on the spatial feature maps; and assigning, by the computing device, at least two cuboids of said predicted cuboids to a given object when predictions from images captured by separate cameras of the plurality of cameras should be associated with a same detected object.

Autonomous driving with surfel maps

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using a surfel map to generate a prediction for a state of an environment. One of the methods includes obtaining surfel data comprising a plurality of surfels, wherein each surfel corresponds to a respective different location in an environment, and each surfel has associated data that comprises an uncertainty measure; obtaining sensor data for one or more locations in the environment, the sensor data having been captured by one or more sensors of a first vehicle; determining one or more particular surfels corresponding to respective locations of the obtained sensor data; and combining the surfel data and the sensor data to generate a respective object prediction for each of the one or more locations of the obtained sensor data.

Multimodal sentiment classification

Sentiment classification can be implemented by an entity-level multimodal sentiment classification neural network. The neural network can include left, right, and target entity subnetworks. The neural network can further include an image network that generates representation data that is combined and weighted with data output by the left, right, and target entity subnetworks to output a sentiment classification for an entity included in a network post.