Patent classifications
G06V20/647
DATASET GENERATION METHOD FOR SELF-SUPERVISED LEARNING SCENE POINT CLOUD COMPLETION BASED ON PANORAMAS
The present invention belongs to the technical field of 3D reconstruction in the field of computer vision, and provides a dataset generation method for self-supervised learning scene point cloud completion based on panoramas. Pairs of incomplete point cloud and target point cloud with RGB information and normal information can be generated by taking RGB panoramas, depth panoramas and normal panoramas in the same view as input for constructing a self-supervised learning dataset for training of the scene point cloud completion network. The key points of the present invention are occlusion prediction and equirectangular projection based on view conversion, and processing of the stripe problem and point-to-point occlusion problem during conversion. The method of the present invention includes simplification of the collection mode of the point cloud data in a real scene; occlusion prediction idea of view conversion; and design of view selection strategy.
TRAINING METHOD AND APPARATUS FOR A TARGET DETECTION MODEL, TARGET DETECTION METHOD AND APPARATUS, AND STORAGE MEDIUM
Provided are a training method and apparatus for a target detection model, a target detection method and apparatus, a device, and a media, which relates to the field of artificial intelligence and, in particular, to computer vision and deep learning technologies, which may be applied to 3D visual scenes. A specific implementation includes acquiring a sample image marked with a difficult region; inputting the sample image into a first target detection model and calculating a first loss corresponding to the difficult region; and increasing the first loss and training the first target detection model according to the increased first loss. In this manner, the accuracy of target detection can be improved and the cost of target detection can be reduced.
Techniques for improving mesh accuracy using labeled inputs
A method and system for improving a three-dimensional (3D) representation of objects using semantic data. The method comprises receiving an input data generated in response to captured video in a filming area; setting at least one parameter for each region in the input data; and generating a 3D representation based in part on the at least one parameter and semantic data associated with the input data.
Methods and Systems for Augmented Reality Tracking Based on Volumetric Feature Descriptor Data
An illustrative augmented reality tracking system obtains a volumetric feature descriptor dataset that includes: 1) a plurality of feature descriptors associated with a plurality of views of a volumetric target, and 2) a plurality of 3D structure datapoints that correspond to the plurality of feature descriptors. The system also obtains an image frame captured by a user equipment (UE) device. The system identifies a set of image features depicted in the image frame and detects, based on a match between the set of image features depicted in the image frame and a set of feature descriptors of the plurality of feature descriptors, that the volumetric target is depicted in the image frame. In response to this detecting and based on 3D structure datapoints corresponding to matched feature descriptors, the system determines a spatial relationship between the UE device and the volumetric target. Corresponding methods and systems are also disclosed.
ADAPTIVE BOUNDING FOR THREE-DIMENSIONAL MORPHABLE MODELS
Systems and techniques are provided for generating one or more models. For example, a process can include obtaining a plurality of input images corresponding to faces of one or more people during a training interval. The process can include determining a value of the coefficient representing at least the portion of the facial expression for each of the plurality of input images during the training interval. The process can include determining, from the determined values of the coefficient representing at least the portion of the facial expression for each of the plurality of input images during the training interval, an extremum value of the coefficient representing at least the portion of the facial expression during the training interval. The process can include generating an updated bounding value for the coefficient representing at least the portion of the facial expression based on the initial bounding value and the extremum value.
System and method for detecting in-vehicle conflicts
Embodiments of the disclosure provide a method for detecting an inter-person conflict. The method includes receiving at least one image from an image data resource, where the at least one image is captured by at least one camera. The method further includes detecting human objects from the at least one image. The method additionally includes determining a distance between two of the detected human objects. The method additionally includes determining whether there is a conflict between the two of the detected human objects based on the determined distance.
Product defect detection method and apparatus, electronic device and storage medium
A product defect detection method and apparatus, an electronic device, and a storage medium are provided. A method includes: acquiring a multi-channel image of a target product; inputting the multi-channel image to a defect detection model, wherein the defect detection model includes a plurality of convolutional branches, a merging module and a convolutional headbranch; performing feature extraction on each channel in the multi-channel image by using the plurality of convolutional branches, to obtain a plurality of first characteristic information; merging the plurality of first characteristic information by using the merging module, to obtain second characteristic information; performing feature extraction on the second characteristic information by using the convolutional headbranch, to obtain third characteristic information to be output by the defect detection model; and determining defect information of the target product based on the third characteristic information.
GEMSTONE PLANNING
A method of determining an optimal target gemstone to be obtained from a rough gemstone comprises obtaining a first series of 2D images of the rough gemstone; providing a 3D model of a target gemstone to be obtained from the rough gemstone; and generating a second series of 2D images of the target gemstone from the 3D model thereof. The method then comprises comparing the first and second series of 2D images to determine an optimal transformation to be applied to the 3D model of the target gemstone.
THREE-DIMENSIONAL OBJECT DETECTION BASED ON IMAGE DATA
Techniques are discussed herein for generating three-dimensional (3D) representations of an environment based on two-dimensional (2D) image data, and using the 3D representations to perform 3D object detection and other 3D analyses of the environment. 2D image data may be received, along with depth estimation data associated with the 2D image data. Using the 2D image data and associated depth data, an image-based object detector may generate 3D representations, including point clouds and/or 3D pixel grids, for the 2D image or particular regions of interest. In some examples, a 3D point cloud may be generated by projecting pixels from the 2D image into 3D space followed by a trained 3D convolutional neural network (CNN) performing object detection. Additionally or alternatively, a top-down view of a 3D pixel grid representation may be used to perform object detection using 2D convolutions.
Virtual fitting systems and methods for spectacles
Various aspects of the subject technology relate to systems, methods, and machine-readable media for virtual fitting of items such as spectacles and/or spectacle frames. A user interface for virtual fitting may be implemented at a server or at a user device, and utilize three-dimensional information for the user and three-dimensional information for each frame, with frame information stored in a frame database, to identify and/or recommend frames that are likely to fit the user. Fit information can be provided for a group of frames or for each individual frame selected by the user. The fit information can be provided with a static image of the frames and/or within a virtual try-on operation in which the frames are virtually placed on a real-time image of the user.