Patent classifications
G06V10/806
Systems and Methods for Generating Document Numerical Representations
Described embodiments relate to a method comprising: determining a candidate document comprising image data and character data and extracting the image data and the character data from the candidate document. The method comprises providing, to an image-based numerical representation generation model, the image data, and generating, by the image-based numerical representation generation model, an image-based numerical representation of the image data. The method comprises providing, to a character-based numerical representation generation model, the character data; and generating, by the character-based numerical representation generation model, a character-based numerical representation of the character data. The method comprises providing, to a consolidated image-character based numerical representation generation model, the image-based numerical representation and the character-based numerical representation; and generating, by the consolidated image-character based numerical representation generation model, a combined image-character based numerical representation of the candidate document.
Road Modeling with Ensemble Gaussian Processes
This document describes road modeling with ensemble Gaussian processes. A road is modeled at a first time using at least one Gaussian process regression (GPR). A kernel function is determined based on a sample set of detections received from one or more vehicle systems. Based on the kernel function, a respective mean lateral position associated with a particular longitudinal position is determined for each GPR of the at least one GPR. The respective mean lateral position for each of the at least one GPR is aggregated to determine a combined lateral position associated with the particular longitudinal position. A road model is then output including the combined lateral position associated with the particular longitudinal position. In this way, a robust and computationally efficient road model may be determined to aid in vehicle safety and performance.
METHOD AND ELECTRONIC DEVICE FOR RECOGNIZING PRODUCT
A method and electronic device for recognizing a product are provided. The method includes obtaining first feature information and second feature information from an image related to a product, obtaining fusion feature information based on the first feature information and the second feature information by using a main encoder model that reflects a correlation between feature information of different modalities, matching the fusion feature information against a database of the product, and providing information about the product, based on a result of the matching.
HIGH-PRECISION POINT CLOUD COMPLETION METHOD BASED ON DEEP LEARNING AND DEVICE THEREOF
The present disclosure discloses a high-precision point cloud completion method based on deep learning and a device thereof, which comprises the following steps: introducing dynamic kernel convolution PAConv into a feature extraction module, learning a weight coefficient according to the positional relationship between each point and its neighboring points, and adaptively constructing the convolution kernel in combination with the weight matrix. A spatial attention mechanism is added to a feature fusion module, which facilitates a decoder to better learn the relationship among various features, and thus better represent the feature information. A discriminator module comprises global and local attention discriminator modules, which use multi-layer full connection to classify and determine whether the generated results conform to the real point cloud distribution globally and locally, respectively, so as to optimize the generated results.
Face track recognition with multi-sample multi-view weighting
In one embodiment, a method determines known features for existing face tracks that have identity labels and builds a database using these features. The face tracks may have multiple different views of a face. Multiple features from the multiple faces may be taken to build the face models. For an unlabeled face track without identity information, the method determines its sampled features and finds labeled nearest neighbor features with respect to multiple feature spaces from the face models. For each face in the unlabeled face track, the method decomposes the face as a linear combination of its neighbors from the known features from the face models. Then, the method determines weights for the known features to weight the coefficients of the known features. Particular embodiments use a non-linear weighting function to learn the weights that provides more accurate labels.
Image recognition method and apparatus, device, and computer storage medium
An image recognition method is provided, which is related to a technical field of artificial intelligence, and in particular, to a technical field of image processing. An implementation includes: performing five-sense-organ recognition on a preprocessed human face image and marking positions of the human facial five sense organs in the human face image, to obtain the marked human face image; determining human face images at multiple scales of the marked human face image, inputting the human face images of multiple scales into a backbone network model, and performing feature extraction, to obtain a wrinkle feature of the human face image at each of the multiple scales; and fusing the wrinkle feature at each scale that is located in a same area of the human face image, to obtain a wrinkle recognition result of the human face image.
Sensor fusion for autonomous machine applications using machine learning
In various examples, a multi-sensor fusion machine learning model—such as a deep neural network (DNN)—may be deployed to fuse data from a plurality of individual machine learning models. As such, the multi-sensor fusion network may use outputs from a plurality of machine learning models as input to generate a fused output that represents data from fields of view or sensory fields of each of the sensors supplying the machine learning models, while accounting for learned associations between boundary or overlap regions of the various fields of view of the source sensors. In this way, the fused output may be less likely to include duplicate, inaccurate, or noisy data with respect to objects or features in the environment, as the fusion network may be trained to account for multiple instances of a same object appearing in different input representations.
Method and apparatus for data efficient semantic segmentation
A method and system for training a neural network are provided. The method includes receiving an input image, selecting at least one data augmentation method from a pool of data augmentation methods, generating an augmented image by applying the selected at least one data augmentation method to the input image, and generating a mixed image from the input image and the augmented image.
ACTION RECOGNITION METHOD AND APPARATUS, AND DEVICE AND STORAGE MEDIUM
An action recognition method and apparatus, and a device and a storage medium. The method comprises: performing grouping processing on original compressed video data to obtain grouped video data (101); inputting the grouped video data into a first preset model, and determining target grouped video data, which includes an action, according to an output result of the first preset model (102); decoding the target grouped video data to obtain grouped video data to be recognized (103); and inputting the grouped video data to be recognized into a second preset model, and determining, according to an output result of the second preset model, the type of the action contained in the grouped video data to be recognized (104).
MULTIMODAL MEDICAL IMAGE FUSION METHOD BASED ON DARTS NETWORK
A multimodal medical image fusion method based on a DARTS network is provided. Feature extraction is performed on a multimodal medical image by using a differentiable architecture search (DARTS) network. The network performs learning by using the gradient of network weight as a loss function in a search phase. A network architecture most suitable for a current dataset is selected from different convolution operations and connections between different nodes, so that features extracted by the network have richer details. In addition, a plurality of indicators that can represent image grayscale information, correlation, detail information, structural features, and image contrast are used as a network loss function, so that the effective fusion of medical images can be implemented in an unsupervised learning way without a gold standard.