G06V30/18143

METHOD OF DETECTING, SEGMENTING AND EXTRACTING SALIENT REGIONS IN DOCUMENTS USING ATTENTION TRACKING SENSORS

A method and system for detecting, segmenting, and extracting salient regions in documents by using attention tracking sensors is provided. The method includes: receiving an image that corresponds to a document; receiving, from a sensor, a sequence of measurements that correspond to a human reading of the document; determining, based on the sequence of measurements, at least one region of the document as being a salient document region; demarcating the salient document region in an electronically displayable manner; and outputting a file that includes a displayable version of the document with the demarcated document region. The salient document region may include a title, a section header, and/or a table. The sensor may be an eye-tracking sensor that detects a sequence of eye-gaze positions on the document as a function of time.

Autonomous driving with surfel maps

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using a surfel map to generate a prediction for a state of an environment. One of the methods includes obtaining surfel data comprising a plurality of surfels, wherein each surfel corresponds to a respective different location in an environment, and each surfel has associated data that comprises an uncertainty measure; obtaining sensor data for one or more locations in the environment, the sensor data having been captured by one or more sensors of a first vehicle; determining one or more particular surfels corresponding to respective locations of the obtained sensor data; and combining the surfel data and the sensor data to generate a respective object prediction for each of the one or more locations of the obtained sensor data.

DESIGN OPTIMIZATION AND USE OF CODEBOOKS FOR DOCUMENT ANALYSIS
20230028992 · 2023-01-26 ·

A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.

METHOD AND APPARATUS FOR PROCESSING DOCUMENT IMAGE, AND ELECTRONIC DEVICE

In a method for processing a document image, a document image to be processed is acquired. Text nodes of multiple granularities, visual nodes of multiple granularities, respective node information of the text nodes, and respective node information of the visual nodes in the document image are obtained. A multi-granularity and multi-modality document graph is construct based on the text nodes of multiple granularities, the visual nodes of multiple granularities, the respective node information of the text nodes and the respective node information of the visual nodes. Multi-granularity semantic feature information of the document image is determined based on the multi-granularity and multi-modality document graph, the respective node information of the text nodes and the respective node information of the visual nodes.

TRAINING A NEURAL NETWORK FOR ACTION RECOGNITION
20220398832 · 2022-12-15 ·

A system for training a neural network for action recognition based on unlabeled action sequences includes a first neural network (NN1) and a second neural network (NN2). A first updating module is arranged to update parameters of NN1 to minimize a difference between representation data generated by NN1 and representation data generated by NN2. A second updating module is arranged to update parameters of NN2 as a function of the parameters of NN1. An augmentation module includes first and second sub-modules and is configured to include augmented versions of incoming action sequences in first and second input data. The first and second sub-modules are configured to apply at least partly different augmentation to the incoming action sequences. After NN1 and NN2 have been operated on one or more instances of the first and second input data, NN1 comprises a parameter definition of a pre-trained neural network.

IMAGE-CAPTURING APPARATUS, IMAGE PROCESSING SYSTEM, IMAGE PROCESSING METHOD, AND PROGRAM
20220366574 · 2022-11-17 ·

To provide an image-capturing apparatus, an image processing system, an image processing method, and a program that make it possible to prevent a moving object from being detected in image information. An image-capturing apparatus according to the present technology includes an image processing circuit. The image processing circuit detects feature points for respective images of a plurality of images captured at a specified frame rate, and performs processing of calculating a moving-object weight of the detected feature point a plurality of times.

APPARATUS FOR SEPARATING FEATURE POINTS FOR EACH OBJECT, METHOD FOR SEPARATING FEATURE POINTS FOR EACH OBJECT AND COMPUTER PROGRAM

An object-specific keypoint separation apparatus includes: an inference execution unit configured to receive a captured image capturing an object as an input and use a pre-trained model that has been trained in order to output a plurality of first maps and a plurality of second maps generated from the input captured image to output the plurality of first maps and the plurality of second maps, the plurality of first maps storing, for keypoints of the object. a distance from a first keypoint only around a second keypoint, and the plurality of second maps representing a heat map configured to have a peak at coordinates at which the keypoint of the object appears; and an object-specific keypoint separation unit configured to separate the keypoints for each object based on the plurality of first maps and the plurality of second maps output from the inference execution unit.

Keypoint unwarping for machine vision applications

An image processing system has one or more memories and image processing circuitry coupled to the one or more memories. The image processing circuitry, in operation, compares a first image to feature data in a comparison image space using a matching model. The comparing includes: unwarping keypoints in keypoint data of the first image; and comparing the unwarped keypoints and descriptor data associated with the first image to the feature data of the comparison image. The image processing circuitry determines whether the first image matches the comparison image based on the comparing.

METHOD AND SYSTEM FOR TRAINING NEURAL NETWORK FOR ENTITY DETECTION

Disclosed is a system and method for training a neural network to be implemented for detecting at least one entity in a document to derive relevant inferences therefrom. The method comprising obtaining at least one document, processing, the at least one document via a detection module to detect a widget entity, wherein the detected widget entity is classified as active or inactive based on a detected state of the widget entity, modifying, the classified widget entity into a corresponding machine-readable widget-entity based on the detected state, processing, the at least one document via an extraction module to detect a text entity in near vicinity of the classified widget entity, generating a training pair comprising the machine-readable widget entity and the corresponding text entity and training the neural network using the generated training pair.

AUTOMATED IMAGE ADS

A system for generating an advertisement is provided. The system may receive an advertisement request from a client device and select an advertisement from a database in response to the advertisement request. The system may identify an advertiser web server associated with the advertisement, for example a landing page. The system may retrieve a picture from the advertiser web server and integrate the picture with the advertisement to generate an enhanced advertisement. The system may serve the enhanced advertisement to the client device.