Patent classifications
G06V10/806
METHOD AND DEVICE FOR 3D OBJECT DETECTION
A method and device for 3D object detection. The method comprises the steps of: generating one or more fused images(201) based on a pair of images(530), the pair of images (530) including a left view image(101,710a) and a right view image(102,710b); extracting one or more fused features from the fused images(201) by a single backbone network(210) of a share network with feature unmixing (SNFU,540); unmixing the fused features to a left view-aware feature and a right view-aware feature by a feature unmixing sub-network(220) of the SNFU(540); predicting the 3D object based on the left view-aware feature and the right view-aware feature; and determining spatial features of the predicted 3D object. The proposed method and device can reduce the computation complexity to the level of that in monocular based 2D object detection so as to improve the computation efficiency while keeping high precision. In addition, the PCNet network for depth estimation can achieve the competitive performance on accuracy and high speed simultaneously when predicting the depth of the 3D object.
VIDEO PERSONNEL RE-IDENTIFICATION METHOD BASED ON TRAJECTORY FUSION IN COMPLEX UNDERGROUND SPACE
Disclosed is a video personnel re-identification method based on trajectory fusion in a complex underground space; an accurate personnel trajectory prediction may be realized through the Social-GAN model; and a spatio-temporal trajectory fusion model is constructed, and personnel trajectory videos that are not affected by the occlusion are introduced into the re-identification network to solve the problem of false extraction of the apparent visual features caused by the occlusion. In addition, a trajectory fusion MARS_traj data set is constructed, and a number of time frames and space coordinate information are added to the MARS data set.
OBJECT MODELING AND MOVEMENT METHOD AND APPARATUS, AND DEVICE
The present invention discloses an object modeling and movement method. The method is applied to a mobile terminal, and the mobile terminal includes a color camera and a TOF module. The method includes: performing panoramic scanning on a target object by using the color camera and the TOF module, to obtain a 3D model of the target object; obtaining a target skeletal model; fusing the target skeletal model and the 3D model of the target object; obtaining a target movement manner; and controlling the target skeletal model in the target movement manner, to animate the 3D model of the target object in the target movement manner. This can implement integration from scanning, 3D reconstruction, skeletal rigging, to preset animation display for an object on one terminal, thereby implementing dynamization of a static object, and increasing interest in using the mobile terminal by a user.
IMAGE RECOGNITION METHOD AND ELECTRONIC APPARATUS THEREOF
An image recognition method and an electronic apparatus configured for image recognition are provided. A training sample set is provided to train a recognition model including neural networks to recognize a classification label to which an image to be tested belongs through the trained recognition model. The training sample set includes image sets respectively belonging to users. During the training process, training images corresponding to classification labels are obtained from a first image set in the training sample set as reference images for training; a training image is obtained from a second image set different from the first image set as an input image for training; the reference images for training and the input image for training are obtained as inputs to the neural networks for training. The input to each neural network includes at least one of the reference images for training and the input image for training.
Methods and systems for augmenting depth data from a depth sensor, such as with data from a multiview camera system
Methods of determining the depth of a scene and associated systems are disclosed herein. In some embodiments, a method can include augmenting depth data of a scene captured with a depth sensor with depth data from one or more images of the scene. For example, the method can include capturing image data of the scene with a plurality of cameras. The method can further include generating a point cloud representative of the scene based on the depth data from the depth sensor and identifying a missing region of the point cloud, such as a region occluded from the view of the depth sensor. The method can then include generating depth data for the missing region based on the image data. Finally, the depth data for the missing region can be merged with the depth data from the depth sensor to generate a merged point cloud representative of the scene.
SYSTEM AND METHOD FOR THE FUSION OF BOTTOM-UP WHOLE-IMAGE FEATURES AND TOP-DOWN ENTTIY CLASSIFICATION FOR ACCURATE IMAGE/VIDEO SCENE CLASSIFICATION
Described is a system and method for accurate image and/or video scene classification. More specifically, described is a system that makes use of a specialized convolutional-neural network (hereafter CNN) based technique for the fusion of bottom-up whole-image features and top-down entity classification. When the two parallel and independent processing paths are fused, the system provides an accurate classification of the scene as depicted in the image or video.
CONTEXTUAL VISUAL-BASED SAR TARGET DETECTION METHOD AND APPARATUS, AND STORAGE MEDIUM
A contextual visual-based synthetic-aperture radar (SAR) target detection method and apparatus, and a storage medium, belonging to the field of target detection is described. The method includes: obtaining an SAR image; and inputting the SAR image into a target detection model, and positioning and recognizing a target in the SAR image by using the target detection model, to obtain a detection result. In the present disclosure, a two-way multi-scale connection operation is enhanced through top-down and bottom-up attention, to guide learning of dynamic attention matrices and enhance feature interaction under different resolutions. The model can extract the multi-scale target feature information with higher accuracy, for bounding box regression and classification, to suppress interfering background information, thereby enhancing the visual expressiveness. After the attention enhancement module is added, the detection performance can be greatly improved with almost no increase in the parameter amount and calculation amount of the whole neck.
METHOD AND APPARATUS FOR IMAGE RESTORATION BASED ON BURST IMAGE
A method and apparatus for image restoration based on burst images. The method includes generating a plurality of feature representations corresponding to individual images of a burst image set by encoding the individual images, determining a reference feature representation from among the plurality of feature representations, determining a first comparison pair including the reference feature representation and a first feature representation of the plurality of feature representations, generating a first motion-embedding feature representation of the first comparison pair based on a similarity score map of the reference feature representation and the first feature representation, generating a fusion result by fusing a plurality of motion-embedding feature representations including the first motion-embedding feature representation, and generating at least one restored image by decoding the fusion result.
IMAGE GENERATION USING SURFACE-BASED NEURAL SYNTHESIS
Aspects of the present disclosure involve a system and a method for performing operations comprising: receiving a two-dimensional continuous surface representation of a three-dimensional object, the continuous surface comprising a plurality of landmark locations; determining a first set of soft membership functions based on a relative location of points in the two-dimensional continuous surface representation and the landmark locations; receiving a two-dimensional input image, the input image comprising an image of the object; extracting a plurality of features from the input image using a feature recognition model; generating an encoded. feature representation of the extracted features using the first set of soft membership functions; generating a dense feature representation of the extracted features from the encoded representation using a second set of soft membership functions; and processing the second set of soft membership functions and dense feature representation using a neural image decoder model to generate an output image.
METHOD AND DEVICE FOR ACQUIRING STATE DATA INDICATING STATE OF USER
Provided is a method of obtaining state data indicating a state of a user. The method includes: obtaining estimation models for obtaining pieces of state data existing in a plurality of layers from sensor data obtained by a sensor; obtaining at least one piece of sensor data; obtaining state data of a lower layer from the at least one piece of sensor data, based on the estimation models; and obtaining state data of a higher layer from the state data of the lower layer, based on the estimation models.