G06V30/274

Entity Recognition Method and Apparatus, and Computer Program Product

An entity recognition method and apparatus, an electronic device, a storage medium, and a computer program product are provided. The method includes: recognizing a to-be-recognized image to determine a preliminary recognition result for entities in the to-be-recognized image; determining, in response to determining that the preliminary recognition result includes a plurality of entities of a same category, image features of the to-be-recognized image and textual features of the plurality of entities; determining whether the plurality of entities is a consecutive complete entity based on the image features and the textual features, to obtain a complete-entity determining result; and obtaining a final recognition result based on the preliminary recognition result and the complete-entity determining result.

Semantic cluster formation in deep learning intelligent assistants

Enhanced techniques and circuitry are presented herein for providing responses to questions from among digital documentation sources spanning various documentation formats, versions, and types. One example includes a method comprising receiving an indication of a question directed to subject having a documentation corpus, determining a set of passages of the documentation corpus related to the question, ranking the set of passages according to relevance to the question, forming semantic clusters comprising sentences extracted from ranked ones of the set of passages according to sentence similarity, and providing a response to the question based at least on a selected semantic cluster.

Semantic map production system and method

The system includes a metric map creation unit configured to create a metric map using first image data received from a 3D sensor, an image processing unit configured to recognize an object by creating and classifying a point cloud using second image data received from an RGB camera; a probability-based map production unit configured to create an object location map and a spatial semantic map in a probabilistic expression method using a processing result of the image processing unit, a question creation unit configured to extract a portion of high uncertainty about an object class from a produced map on the basis of entropy and ask a user about the portion, and a map update unit configured to receive a response from the user and update a probability distribution for spatial information according to a change in probability distribution for classification of the object.

Neural network processing for multi-object 3D modeling

Embodiments are directed to neural network processing for multi-object three-dimensional (3D) modeling. An embodiment of a computer-readable storage medium includes executable computer program instructions for obtaining data from multiple cameras, the data including multiple images, and generating a 3D model for 3D imaging based at least in part on the data from the cameras, wherein generating the 3D model includes one or more of performing processing with a first neural network to determine temporal direction based at least in part on motion of one or more objects identified in an image of the multiple images or performing processing with a second neural network to determine semantic content information for an image of the multiple images.

Contextual span framework

A phrase that includes a trigger word that modifies a meaning within the phrase is received. The trigger word is identified. The words of the phrase that are modified by the trigger word are identified by analyzing features of the phrase that link the trigger word to other words. The phrase is interpreted by modifying the second subset of words according to the modification of the trigger word.

METHODS FOR MOBILE IMAGE CAPTURE OF VEHICLE IDENTIFICATION NUMBERS IN A NON-DOCUMENT
20180012100 · 2018-01-11 ·

Various embodiments disclosed herein are directed to methods of capturing Vehicle Identification Numbers (VIN) from images captured by a mobile device. Capturing VIN data can be useful in several applications, for example, insurance data capture applications. There are at least two types of images supported by this technology: (1) images of documents and (2) images of non-documents.

MULTI-DOMAIN CONVOLUTIONAL NEURAL NETWORK

In one embodiment, an apparatus comprises a memory and a processor. The memory is to store visual data associated with a visual representation captured by one or more sensors. The processor is to: obtain the visual data associated with the visual representation captured by the one or more sensors, wherein the visual data comprises uncompressed visual data or compressed visual data; process the visual data using a convolutional neural network (CNN), wherein the CNN comprises a plurality of layers, wherein the plurality of layers comprises a plurality of filters, and wherein the plurality of filters comprises one or more pixel-domain filters to perform processing associated with uncompressed data and one or more compressed-domain filters to perform processing associated with compressed data; and classify the visual data based on an output of the CNN.

METHOD AND APPARATUS FOR RETRIEVING TARGET

A method and an apparatus for retrieving a target are provided. The method may include: obtaining at least one image and a description text of a designated object; extracting image features of the image and text features of the description text by using a pre-trained cross-media feature extraction network; and matching the image features with the text features to determine an image that contains the designated object.

INFORMATION EXTRACTION METHOD AND APPARATUS, ELECTRONIC DEVICE AND READABLE STORAGE MEDIUM

The present disclosure provides an information extraction method and apparatus, an electronic device and a readable storage medium, and relates to the field of natural language processing technologies. The information extraction method includes: acquiring a to-be-extracted text; acquiring a sample set, the sample set including a plurality of sample texts and labels of sample characters in the plurality of sample texts; determining a prediction label of each character in the to-be-extracted text according to a semantic feature vector of each character in the to-be-extracted text and a semantic feature vector of each sample character in the sample set; and extracting, according to the prediction label of each character, a character meeting a preset requirement from the to-be-extracted text as an extraction result of the to-be-extracted text. The present disclosure can simplify steps of information extraction, reduce costs of information extraction and improve flexibility and accuracy of information extraction.

System and method for improving localization and object tracking

In one embodiment, a computing system is configured to, during a first tracking session, detect first landmarks in a first image of the environment surrounding a user, and determine a first location of the user by comparing detected first landmarks to a landmark database. During a second tracking session, the computing system captures motion data and estimates a second location of the user based on the motion data and first user location. Based on the motion data and first user location, the computing system detects landmarks in a second image at a second location. The system accesses expected landmarks from the landmark database visible at the second location and determines the estimated second location of the user is inaccurate by comparing the expected landmarks with the second landmarks. The computing system re-localizes the user by comparing the landmarks in the landmark database and third landmarks in a third image.