Patent classifications
G06V20/63
VISUAL RECOGNITION USING USER TAP LOCATIONS
Methods, systems, and apparatus for receiving a query image and a user tap location, processing the received query image based on the user tap location, identifying one or more entities associated with the processed query image and in response to receiving (i) the query image, and (ii) the user tap location, providing information about the identified one or more of the entities.
Systems and Methods to Perform 3D Localization of Target Objects in Point Cloud Data Using A Corresponding 2D Image
The present invention relates to a systems and methods to perform 3D localization of target objects in point cloud data using a corresponding 2D image. According to an illustrative embodiment of the present disclosure, a target environment is imaged with a camera to generate a 2D panorama and a scanner to generate a 3D point cloud. The 2D panorama is mapped to the point cloud with a 1 to 1 grid map. The target objects are detected and localized in 2D before being mapped back to the 3D point cloud.
METHOD AND ELECTRONIC DEVICE FOR RECOGNIZING TEXT IN IMAGE
A method and an electronic device for recognizing text are provided. The method includes detecting positions of pieces of text included in the text in the image, generating cropped images by cropping areas corresponding to the pieces of text in the image, recognizing characters of the pieces of text based on the cropped images, generating a sentence by inputting the positions of the pieces of text and the characters of the pieces of text to a multimodal language model, wherein the multimodal language model is an artificial intelligence (AI) model for inferring an original sentence of the text, and displaying the sentence.
METHOD AND APPARATUS FOR ASSISTING THE DISABLED
A system for facilitating access for a disabled user includes a wrist-length glove configured for being worn over the user's hand and wrist, a LIDAR transmitter/receiver configured for emitting a laser light, receiving reflected laser light and calculating a range to objects that reflected the reflected laser light, wherein the LIDAR transmitter/receiver is removably coupled to a top of a wrist area of the glove, a mobile computing device and a mobile application for providing a text to speech program, a speech to text program, a money recognition system, a visual recognition system that detects scenes, and a proximity detection system that reads data from the LIDAR transmitter/receiver and produces speech that identifies the range to the objects that reflected the reflected laser light.
Contextually disambiguating queries
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextually disambiguating queries are disclosed. In an aspect, a method includes receiving an image being presented on a display of a computing device and a transcription of an utterance spoken by a user of the computing device, identifying a particular sub-image that is included in the image, and based on performing image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image. The method also includes, based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image, based on the transcription, the first labels, and the second labels, generating a search query, and providing, for output, the search query.
Text refinement network
Systems and methods for text segmentation are described. Embodiments of the inventive concept are configured to receive an image including a foreground text portion and a background portion, classify each pixel of the image as foreground text or background using a neural network that refines a segmentation prediction using a key vector representing features of the foreground text portion, wherein the key vector is based on the segmentation prediction, and identify the foreground text portion based on the classification.
Automated image-based inventory record generation systems and methods
Methods and systems for automatically creating item records for physical items. The method may include receiving an image obtained using an image sensor; detecting a physical item in the image; extracting item data regarding the physical item by applying image analysis to the image; determining, using the extracted item data, whether a memory contains an item record for the physical item; and, when no item record for the physical item exists in the memory, generating and storing in the memory a new item record for the physical item in association with the extracted item data.
SYSTEMS AND METHODS FOR AUTOMATED METER READING
Systems and methods are provided for optically reading data from metering equipment by using a camera to record images of a meter, and converting the displayed data measurement regions of the images into alphanumeric data for reporting over a wireless network to a database system for storage, analysis and reporting. One implementation of the device is a water meter reading device integrated into a form factor that replaces the lid of a water meter box. The device captures images of the meter face, converts the images using optical character recognition software into usage data, meter identification and date/time of the data capture and sends the data to a database over a wireless data network.
MOBILE CAMERA FOR VALIDATION
A mobile camera unit includes a base having a plurality of wheels. A support extends upward from the base. At least one camera mounted to the support. At least one processor is configured to collect images from the at least one camera. A communication circuit is configured to transmit the images. The mobile camera unit may be used to capture images of a plurality of products on a pallet or other platform. The images are transmitted to a server to identify SKUs associated with the products and to compare the identified SKUs with an order or pick list.
System and Method for Processing Insurance Cards
A system and method processes images of insurance cards to extract information. The images of the insurance cards are processed using OCR to identify characters on the insurance cards. Combinations of characters on each insurance card are identified as tokens, and their relative spatial orientation is determined. Deep learning architectures are utilized to generate a fully connected neural network with a node for each token on each card. The neural network is utilized to extract entities from each insurance card, such as a valid member ID.