G06V10/806

Image Recognition Method and System of Convolutional Neural Network Based on Global Detail Supplement
20230368497 · 2023-11-16 ·

An image recognition method and system of convolutional neural network based on global detail supplement, as follows: acquire the image to be recognized and then input it to trained feature extraction network for feature extraction, and obtain features of each stage; learn detail feature according to the image to be tested, and extract the detail feature map; use the self-attention mechanism to fuse the feature map and detail feature map output at the last stage to obtain global detail features; fuse the global detail feature and the features in each stage to obtain the features after global detail supplement; and classify according to the features after global detail supplement, and the category of the maximum value after calculation is the image classification result. The invention constructs a convolution neural network based on global detail supplement, and uses progressive training for image fine granularity classification, further improving fine granularity classification accuracy.

Systems and methods for constructing and utilizing field-of-view (FOV) information

Described herein are systems, methods, and non-transitory computer readable media for constructing and utilizing vehicle field-of-view (FOV) information. The FOV information can be utilized in connection with vehicle localization such as localization of an autonomous vehicle (AV), sensor data fusion, or the like. A customized computing machine can be provided that is configured to construct and utilize the FOV information. The customized computing machine can utilize the FOV information, and more specifically, FOV semantics data included therein to manage various data and execution patterns relating to processing performed in connection with operation of an AV such as, for example, data prefetch operations, reordering of sensor data input streams, and allocation of data processing among multiple processing cores.

METHODS AND APPARATUS FOR HUMAN POSE ESTIMATION FROM IMAGES USING DYNAMIC MULTI-HEADED CONVOLUTIONAL ATTENTION

An apparatus for 3D human pose estimation using dynamic multi-headed convolutional attention mechanism is presented. The apparatus contains two dynamic multi-headed convolutional attention mechanism with spatial attention and another with temporal attention that leverages the spatial attention mechanism to extract frame-wise inter-joint dependencies by analyzing sections of limbs that are related. The temporal attention mechanism extracts global inter-frame relationships by analyzing correlations between the temporal profile of joints. The temporal profile mechanism leads to a more diverse temporal attention map while achieving substantial parameter reduction.

Multiple object detection method and apparatus

Disclosed are multiple object detection method and apparatus. The multiple object detection apparatus includes a feature map extraction unit for extracting a plurality of multi-scale feature maps based on an input image, and a feature map fusion unit for generating a multi-scale fusion feature map including context information by fusing adjacent multi-scale feature maps among the plurality of multi-scale feature maps generated by the feature map extraction unit.

Computer aided diagnosis system for detecting tissue lesion on microscopy images based on multi-resolution feature fusion
11810297 · 2023-11-07 · ·

Embodiments of the present disclosure include a method, device and computer readable medium involving receiving image data to detect tissue lesions, passing the image data through at least one first convoluted neural network, segmenting the image data, fusing the segmented image data, and detecting tissue lesions.

TRACK SEGMENT CLEANING OF TRACKED OBJECTS
20230360379 · 2023-11-09 ·

Provided are methods for track segment cleaning of tracked objects using neural networks, which can include detecting a first track segment and a second track segment. The method includes applying a machine learning model trained to determine if the first track segment and second track segment capture real objects and if the first track segment and the second track segment are representative of an identical object exterior to a vehicle. The method further includes combining the first track segment and the second track segment to form a single track segment having a single trajectory in response to the first track segment and the second track segment being determined to be representative of the identical object. Systems and computer program products are also provided.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING PROGRAM, AND INFORMATION PROCESSING SYSTEM
20230351412 · 2023-11-02 ·

An information processing apparatus (100) includes: a personal authentication unit (111) that performs personal authentication of a seller; and a product identification unit (112) that identifies whether a received product received by a purchaser of an authentic product posted for selling by the seller who has undergone the personal authentication matches the authentic product based on a product feature of the authentic product and a product feature of the received product.

METHOD, COMPUTER-READABLE MEDIUM, AND ELECTRONIC DEVICE FOR IMAGE TEXT RECOGNITION
20230360183 · 2023-11-09 ·

An image text recognition method includes converting an image into a grayscale image, and segmenting, according to layer intervals to which grayscale values of pixels in the grayscale image belong, the grayscale image into grayscale layers with one corresponding to a layer interval, performing image erosion on a grayscale layer to obtain a feature layer corresponding to the grayscale layer, the feature layer including at least one connected region; overlaying feature layers to obtain an overlaid feature layer, the overlaid feature layer including connected regions; dilating connected regions on the overlaid feature layer according to a preset direction to obtain text regions; and performing text recognition on the text regions on the overlaid feature layer to obtain a recognized text corresponding to the image.

MULTIMODAL MACHINE LEARNING IMAGE AND TEXT COMBINED SEARCH METHOD
20230368509 · 2023-11-16 ·

Methods, systems, and computer-readable storage media for a multimodal machine learning image and text combined search method. One example method includes processing items that each have an associated image and a textual description. A first image feature vector is generated by processing a first image using a first machine learning model. A first textual feature vector is generated by processing a first textual description using a second machine learning model. The first image feature vector and the first textual feature vector are combined to generate a first combined feature vector for a first item. Similarity lists of similar items are generated for the first item based on similarities between the first image feature vector, the first text feature vector, the first combined feature vector and respective corresponding vectors of other items. The similarity lists for the first item are combined to generate a combined similarity list for the first item.

COUNTERFACTUAL DEBIASING INFERENCE FOR COMPOSITIONAL ACTION RECOGNITION
20230368529 · 2023-11-16 ·

One or more computer processors improve action recognition by removing inference introduced by visual appearances of objects within a received video segment. The one or more computer processors extract appearance information and structure information from a received video segment. The one or more computer processors calculate a factual inference (TE) for the received video segment utilizing the extracted appearance information and structure information. The one or more computer processors calculate a counterfactual debiasing inference (NDE) for the received video segment. The one or more computer processors calculate a total indirect effect (TIE) by subtracting the calculated counterfactual debiased inference from the calculated factual inference. The one or more computer processors action recognize the received video segment by selecting a classification result associated with a highest calculated TIE.