G06V10/426

Automated form understanding via layout agnostic identification of keys and corresponding values

Techniques for automated form understanding via layout-agnostic identification of keys and corresponding values are described. An embedding generator creates embeddings of pixels from an image including a representation of a form. The generated embeddings are similar for pixels within a same key-value unit, and far apart for pixels not in a same key-value unit. A weighted bipartite graph is constructed including a first set of nodes corresponding to keys of the form and a second set of nodes corresponding to values of the form. Weights for the edges are determined based on an analysis of distances between ones of the embeddings. The graph is partitioned according to a scheme to identify pairings between the first set of nodes and the second set of nodes that produces a minimum overall edge weight. The pairings indicate keys and values that are associated within the form.

PICTURE PROCESSING METHOD, AND TASK DATA PROCESSING METHOD AND APPARATUS
20200401829 · 2020-12-24 ·

A picture processing method is provided for a computer device. The method includes obtaining a to-be-processed picture; extracting a text feature in the to-be-processed picture using a machine learning model; and determining text box proposals at any angles in the to-be-processed picture according to the text feature. Corresponding subtasks are performed by using processing units corresponding to substructures in the machine learning model, and at least part of the processing units comprise a field-programmable gate array (FPGA) unit. The method also includes performing rotation region of interest (RROI) pooling processing on each text box proposal, and projecting the text box proposal onto a feature graph of a fixed size, to obtain a text box feature graph corresponding to the text box proposal; and recognizing text in the text box feature graph, to obtain a text recognition result.

GENERATING SCENE GRAPHS FROM DIGITAL IMAGES USING EXTERNAL KNOWLEDGE AND IMAGE RECONSTRUCTION
20200401835 · 2020-12-24 ·

Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.

DATA STRUCTURE GENERATION FOR TABULAR INFORMATION IN SCANNED IMAGES

Computer-implemented methods are provided for generating a data structure representing tabular information in a scanned image. Such a method can include storing image data representing a scanned image of a table, processing the image data to identify positions of characters and lines in the image, and mapping locations in the image of information cells, each containing a set of the characters, in dependence on said positions. The method can also include, for each cell, determining cell attribute values, dependent on the cell locations, for a predefined set of cell attributes, and supplying the attribute values as inputs to a machine-learning model trained to pre-classify cells as header cells or data cells in dependence on cell attribute values.

Angiographic Data Analysis
20200394793 · 2020-12-17 · ·

A method of analysing data from an angiographic scan that provides three-dimensional information about blood vessels in a patient's brain, the method comprising the steps of: processing the data (26) to produce a three-dimensional image; extracting the system of blood vessels inside the skull, so as to obtain a vessel mask (28); skeletonising (30) the vessel mask with a thinning algorithm to produce a skeleton mask performing a central plane extraction; analysing (32) the skeleton mask to identify voxels that have more than two neighbours, indicating a fork, bifurcation or branch; detecting the most proximal location of each of the three main supplying arteries of the head in the skeleton mask to identify starting positions; and then starting from each starting position in turn, and walking along the line representing the corresponding blood vessel to detect (34) a plurality of anatomical markers within the network of blood vessels.

Object learning and recognition method and system

An object recognition apparatus, a classification tree learning apparatus, an operation method of the object recognition apparatus, and an operation method of the classification tree learning apparatus are provided. The object recognition apparatus may include an input unit to receive, as an input, a depth image representing an object to be analyzed, and a processing unit to recognize a visible object part and a hidden object part of the object, from the depth image, using a classification tree.

Skeleton-based effects and background replacement

Various embodiments of the present invention relate generally to systems and methods for analyzing and manipulating images and video. In particular, a multi-view interactive digital media representation (MVIDMR) of a person can be generated from live images of a person captured from a hand-held camera. Using the image data from the live images, a skeleton of the person and a boundary between the person and a background can be determined from different viewing angles and across multiple images. Using the skeleton and the boundary data, effects can be added to the person, such as wings. The effects can change from image to image to account for the different viewing angles of the person captured in each image.

NON-INVASIVE, INTERNAL IMAGING FOR DUAL BIOMETRIC AUTHENTICATION AND BIOMETRIC HEALTH MONITORING FOR GRANTING ACCESS UTILIZING UNIQUE INTERNAL CHARACTERISTICS OF SPECIFIC USERS
20200358762 · 2020-11-12 ·

Biometric health monitoring of a specific user or population is performed during biometric authentication for granting access to physical or digital assets. If biometric authentication, biometric verification and biometric health monitoring is acceptable, access to the physical or digital assets is allowed. Likewise, if a health anomaly is detected in a specific user or if an outbreak is detected in a specific community, an electronic notification can be sent to the individual, a health administrator, or to a government official, and access may be denied to the specific user.

Board damage classification system

A board damage classification system includes a Convolutional Neural Network (CNN) sub-engine and a Graph Convolutional Network (GCN) sub-engine that were trained based on digital images of structures that have experienced natural disasters. The CNN sub-engine receives a board digital image of a board, analyzes the board digital image to identify board features, and determines a board feature damage classification for the board features. The CGN sub-engine receives a board feature graph that was generated using the board digital image and that includes nodes that correspond to the board features in the board digital image, and defines relationships between the nodes included in the board feature graph. The board feature damage classification determined by the CNN sub-engine and the relationships defined by the GCN sub-engine are then used to generate a board damage classification that includes a damage probability for board features in the board digital image.

Board damage classification system

A board damage classification system includes a Convolutional Neural Network (CNN) sub-engine and a Graph Convolutional Network (GCN) sub-engine that were trained based on digital images of structures that have experienced natural disasters. The CNN sub-engine receives a board digital image of a board, analyzes the board digital image to identify board features, and determines a board feature damage classification for the board features. The CGN sub-engine receives a board feature graph that was generated using the board digital image and that includes nodes that correspond to the board features in the board digital image, and defines relationships between the nodes included in the board feature graph. The board feature damage classification determined by the CNN sub-engine and the relationships defined by the GCN sub-engine are then used to generate a board damage classification that includes a damage probability for board features in the board digital image.