G06V10/454

Deep direct localization from ground imagery and location readings
11710251 · 2023-07-25 · ·

In one embodiment, a method includes receiving an image associated with an object in an environment, the image being captured by sensors associated with a vehicle, generating a feature representation of the image, determining a potential ground control point associated with the object based on the feature representation of the image, determining a predetermined location reading based on the potential ground control point, calculating a differential relative to the predetermined location reading based on the potential ground control point, and determining a location of the vehicle based on the differential and the predetermined location reading based on the potential ground control point.

MACHINE LEARNING IMAGE PROCESSING

A machine learning image processing system performs natural language processing (NLP) and auto-tagging for an image matching process. The system facilitates an interactive process, e.g., through a mobile application, to obtain an image and supplemental user input from a user to execute an image search. The supplemental user input may be provided from a user as speech or text, and NLP is performed on the supplemental user input to determine user intent and additional search attributes for the image search. Using the user intent and the additional search attributes, the system performs image matching on stored images that are tagged with attributes through an auto-tagging process.

Text recognition for a neural network
11710304 · 2023-07-25 · ·

Image data having text associated with a plurality of text-field types is received, the image data including target image data and context image data. The target image data including target text associated with a text-field type. The context image data providing a context for the target image data. A trained neural network that is constrained to a set of characters for the text-field type is applied to the image data. The trained neural network identifies the target text of the text-field type using a vector embedding that is based on learned patterns for recognizing the context provided by the context image data. One or more predicted characters are provided for the target text of the text-field type in response to identifying the target text using the trained neural network.

Target detection method and apparatus, computer-readable storage medium, and computer device

This application relates to a target detection method performed at a computer device. The method includes: obtaining a to-be-detected image; extracting a first image feature and a second image feature corresponding to the to-be-detected image; performing dilated convolution to the second image feature, to obtain a third image feature corresponding to the to-be-detected image; performing classification and regression to the first image feature and the third image feature, to determine candidate position parameters corresponding to a target object in the to-be-detected image and degrees of confidence corresponding to the candidate position parameters; and selecting a valid position parameter from the candidate position parameters according to their corresponding degrees of confidence, and determining a position of the target object in the to-be-detected image according to the valid position parameter. The solutions in this application can improve robustness and consume less time.

Systems and methods for improving the classification of objects

Systems, methods, and other embodiments described herein relate to improving the classification of objects depicted in a scene. In one embodiment, a method includes generating, using an ontological detector, a type classification of a detected object according to a detector ontology of known classes. The detected object is represented as segmented data from sensor data about a surrounding environment. The method includes, in response to determining that the type classification specifies an unknown class that is not defined in the detector ontology, annotating the segmented data as unknown. The method includes providing the segmented data to specify that the type classification for the detected object is unknown.

Method and apparatus for mammographic multi-view mass identification

A method, applied to an apparatus for mammographic multi-view mass identification, includes receiving a main image, a first auxiliary image, and a second auxiliary image. The main image and the first auxiliary image are images of a breast of a person, and the second auxiliary image is an image of another breast of the person. The method further includes detecting the nipple location based on the main image and the first auxiliary image; generating a first probability map of the main image based on the main image, the first auxiliary image, and the nipple location; generating a second probability map of the main image based on the main image, the second auxiliary image, and the nipple location; and generating and outputting a fused probability map based on the first probability map and the second probability map.

Human body attribute recognition method and apparatus, electronic device, and storage medium

The present disclosure describes human body attribute recognition methods and apparatus, electronic devices, and a storage medium. The method includes acquiring a sample image containing a plurality of to-be-detected areas being labeled with true values of human body attributes; generating, through a recognition model, a heat map of the sample image and heat maps of the to-be-detected areas to obtain a global heat map and local heat maps; fusing the global and the local heat maps to obtain a fused image, and performing human body attribute recognition on the fused image to obtain predicted values; determining a focus area of each type of human body attribute according to the global and the local heat maps; correcting the recognition model by using the focus area, the true values, and the predicted values; and performing, based on the corrected recognition model, human body attribute recognition on a to-be-recognized image.

IMAGE PROCESSING NEURAL NETWORKS WITH SEPARABLE CONVOLUTIONAL LAYERS
20230237314 · 2023-07-27 ·

A neural network system is configured to receive an input image and to generate a classification output for the input image. The neural network system includes: a separable convolution subnetwork comprising a plurality of separable convolutional neural network layers arranged in a stack one after the other, in which each separable convolutional neural network layer is configured to: separately apply both a depthwise convolution and a pointwise convolution during processing of an input to the separable convolutional neural network layer to generate a layer output.

MULTI-DOMAIN CONVOLUTIONAL NEURAL NETWORK

In one embodiment, an apparatus comprises a memory and a processor. The memory is to store visual data associated with a visual representation captured by one or more sensors. The processor is to: obtain the visual data associated with the visual representation captured by the one or more sensors, wherein the visual data comprises uncompressed visual data or compressed visual data; process the visual data using a convolutional neural network (CNN), wherein the CNN comprises a plurality of layers, wherein the plurality of layers comprises a plurality of filters, and wherein the plurality of filters comprises one or more pixel-domain filters to perform processing associated with uncompressed data and one or more compressed-domain filters to perform processing associated with compressed data; and classify the visual data based on an output of the CNN.

SYSTEM AND METHODS TO OPTIMIZE NEURAL NETWORKS USING SENSOR FUSION
20230237784 · 2023-07-27 ·

A method for optimizing a neural network is provided, including: (1) capturing, via a first sensor group having a first field of view, a first sample set having a first sensor domain corresponding to the first field of view; (2) capturing, via a second sensor group having a second field of view, a second sample set having a second sensor domain corresponding to the second field of view; (3) generating regions of interest of the second sample set; (4) translating the regions of interest to the first sensor domain; (5) identifying nodes of the neural network which correspond to the translated regions; and (6) optimizing the neural network by at least one of (a) increasing the weight value of the nodes corresponding to the one or more translated regions and (b) decreasing the weight value of the nodes not corresponding to the one or more translated regions.