Patent classifications
G06V10/806
ASSESSMENT OF IMAGE QUALITY FOR OPTICAL CHARACTER RECOGNITION USING MACHINE LEARNING
Aspects of the disclosure provide for systems and processes for assessing image quality for optical character recognition (OCR), including but not limited to: segmenting an image into patches, providing the segmented image as an input into a first machine learning model (MLM), obtaining, using the first MLM, for each patch, first feature vectors representative of a reduction of imaging quality in a respective patch, and second feature vectors representative of a text content of the respective patch, providing to a second MLM the first feature vectors and the second feature vectors, and obtaining, using the second MLM, an indication of suitability of the image for OCR.
Multi-person pose recognition method and apparatus, electronic device, and storage medium
In a multi-person pose recognition method, a to-be-recognized image is obtained, and a circuitous pyramid network is constructed. The circuitous network pyramid includes parallel phases, and each phase includes downsampling network layers, upsampling network layers, and a first residual connection layer to connect the downsampling and upsampling network layers. The phases are interconnected by a second residual connection layer. The circuitous pyramid network is traversed, by extracting a feature map for each phase, and the feature map of the last phase is determined to be the feature map of the to-be-recognized image. Multi-pose recognition is then performed on the to-be-recognized image according to the feature map to obtain a pose recognition result for the to-be-recognized image.
Fine-grained image recognition method, electronic device and storage medium
The present disclosure provides a fine-grained image recognition method, an electronic device and a computer readable storage medium. The method comprises the steps of feature extraction, calculation of feature discriminant loss function, calculation of feature diversity loss function and calculation of model optimization loss function. The present disclosure comprehensively considers influences of factors such as a large intra-class difference, a small inter-class difference, and a great influence of background noise of the fine-grained image, and makes constrains such that the feature maps belonging to each class are discriminative and have the features of corresponding class, thus reducing the intra-class difference, decreasing the learning difficulty and learning better discriminative features. The constraints make the feature maps belonging to each class have a diversity, which increases the inter-class difference, achieves a good result, and is easy for practical deployment, thereby obviously improving the effect of multiple fine-grained image classification tasks.
Method and apparatus for asynchronous data fusion, storage medium and electronic device
A method and an apparatus for asynchronous data fusion, a storage medium and an electronic device are provided. The method includes: obtaining current frame LiDAR data, and determining current frame LiDAR three-dimensional embeddings; determining a previous frame fused hidden state, and performing a temporal fusion process based on the previous frame fused hidden state and the current frame LiDAR three-dimensional embeddings to generate a current frame temporary hidden state and a current frame output result; and obtaining current frame camera data, determining current frame camera three-dimensional embeddings, and generating a current frame fused hidden state based on the current frame camera three-dimensional embeddings and the current frame temporary hidden state. Asynchronous fusion is performed on the current frame LiDAR data and previous frame camera data, which leads to a low processing latency.
SCENE RECONSTRUCTION IN THREE-DIMENSIONS FROM TWO-DIMENSIONAL IMAGES
This specification relates to reconstructing three-dimensional (3D) scenes from two-dimensional (2D) images using a neural network. According to a first aspect of this specification, there is described a method for creating a three-dimensional reconstruction of a scene with multiple objects from a single two-dimensional image, the method comprising: receiving a single two-dimensional image; identifying all objects in the image to be reconstructed and identifying the type of said objects; estimating a three-dimensional representation of each identified object; estimating a three-dimensional plane physically supporting all three-dimensional objects; and positioning all three-dimensional objects in space relative to the supporting plane.
POINT CLOUD FEATURE ENHANCEMENT AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM
The present disclosure relates to a point cloud feature enhancement and apparatus, a computer device and a storage medium. The method includes: acquiring a three-dimensional point cloud, the three-dimensional point cloud including a plurality of input points; performing feature aggregation on neighborhood point features of the input point to obtain a first feature of the input point; mapping the first feature to an attention point corresponding to the corresponding input point; performing feature aggregation on neighborhood point features of the attention point to obtain a second feature of the corresponding input point; and performing feature fusion on the first feature and the second feature of the input point to obtain a corresponding enhanced feature. An enhancement effect of point cloud features can be improved with the method.
Systems and Methods for Generating Document Numerical Representations
Described embodiments relate to a method comprising: determining a candidate document comprising image data and character data and extracting the image data and the character data from the candidate document. The method comprises providing, to an image-based numerical representation generation model, the image data, and generating, by the image-based numerical representation generation model, an image-based numerical representation of the image data. The method comprises providing, to a character-based numerical representation generation model, the character data; and generating, by the character-based numerical representation generation model, a character-based numerical representation of the character data. The method comprises providing, to a consolidated image-character based numerical representation generation model, the image-based numerical representation and the character-based numerical representation; and generating, by the consolidated image-character based numerical representation generation model, a combined image-character based numerical representation of the candidate document.
MULTI-SPECTRUM VISUAL OBJECT RECOGNITION
Aspects of the present disclosure relate to multi-spectrum visual object recognition. A first image corresponding to visible light and a second image corresponding to invisible light with respect to an object can be obtained. A first contour of the object can be identified based on the first image. A second contour of the object can be identified based on the second image. The first contour of the object and the second contour of the object can be integrated to generate a multi-spectrum contour of the object. The object can be recognized using the multi-spectrum contour of the object.
Multi-Task Multi-Sensor Fusion for Three-Dimensional Object Detection
Provided are systems and methods that perform multi-task and/or multi-sensor fusion for three-dimensional object detection in furtherance of, for example, autonomous vehicle perception and control. In particular, according to one aspect of the present disclosure, example systems and methods described herein exploit simultaneous training of a machine-learned model ensemble relative to multiple related tasks to learn to perform more accurate multi-sensor 3D object detection. For example, the present disclosure provides an end-to-end learnable architecture with multiple machine-learned models that interoperate to reason about 2D and/or 3D object detection as well as one or more auxiliary tasks. According to another aspect of the present disclosure, example systems and methods described herein can perform multi-sensor fusion (e.g., fusing features derived from image data, light detection and ranging (LIDAR) data, and/or other sensor modalities) at both the point-wise and region of interest (ROI)-wise level, resulting in fully fused feature representations.
Moving state analysis device, moving state analysis method, and program
A moving state analysis device improves accuracy of moving state recognition by including a detection unit configured to detect, from image data associated with a frame, an object and a region of the object, for each of frames that constitute first video data captured in a course of movement of a first moving body, and a learning unit configured to learn a DNN model that takes video data and sensor data as input and that outputs a probability of each moving state, based on the first video data, a feature of first sensor data measured in relation to the first moving body and corresponding to a capture of the first video data, a detection result of the object and the region of the object, and information that indicates a moving state associated with the first video data.