G06V30/18152

DESIGN OPTIMIZATION AND USE OF CODEBOOKS FOR DOCUMENT ANALYSIS
20230028992 · 2023-01-26 ·

A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.

UNSUPERVISED DOCUMENT REPRESENTATION LEARNING VIA CONTRASTIVE AUGMENTATION
20220164600 · 2022-05-26 ·

Systems and methods for augmenting data sets is provided. The systems and methods includes feeding an original document into a data augmentation generator to produce one or more augmented documents; calculating a contrastive loss between the original document and the one or more augmented documents; and using the original document and the one or more augmented documents to train a neural network.

WORK RECORD EXTRACTION DEVICE AND WORK RECORD EXTRACTION SYSTEM
20230282014 · 2023-09-07 ·

Provided is a work record extraction device that can correctly select a drawing element corresponding to handwriting even if there is a handwriting deviation when position coordinates are collated between handwritten data overwritten on drawing data by manual input and a drawing element on the drawing data. The work record extraction device according to the invention sets, around a drawing element, a boundary area including at least a part of the drawing element, determines whether handwritten data passes through at least a part of the boundary area, and determines that the handwritten data passes through the drawing element in a case where the handwritten data passes through at least a part of the boundary area.

Document Extraction Template Induction

A method for document extraction includes receiving, from a user device associated with a user, an annotated document that includes one or more fields. Each respective field of the one or more fields of the annotated document is labeled by a respective annotation. The method includes clustering, using a template matching algorithm, the annotated document into a cluster and inducing, using the annotated document, a document template for the cluster. The method includes receiving, from the user device, an unannotated document including the one or more fields. The method includes clustering, using the template matching algorithm, the unannotated document into the cluster and, in response to clustering the unannotated document into the cluster, extracting, using the document template, the one or more fields.

Optimization and use of codebooks for document analysis

A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.

SALIENCE-AWARE CROSS-ATTENTION FOR ABSTRACTIVE SUMMARIZATION

A method including: receiving an input comprising natural language texts at an encoder; adding a token to the input; obtaining a last-layer hidden state as a natural language text representation; feeding the natural language text representation into a single-layer classification head; predicting a salience allocation based on the single-layer classification head; developing a salience-aware cross-attention (SACA) decoder to determine salience in the natural language text representation; mapping a plurality of salience degrees to a plurality of trainable salience embeddings; estimating an amount of signal to accept from the plurality of trainable salience embeddings; incorporating the salience allocation and the signal in a cross-attention layer model; and generating a summarization based on the SACA decoder and the cross-attention layer model.

Methods and systems for generating composite image descriptors

An illustrative image descriptor generation system determines a subset of image descriptors from a plurality of image descriptors that each correspond to a different feature point included within an image. The subset of image descriptors is determined based on geometric proximity, within the image, of respective feature points of the subset of image descriptors to a feature point of a primary image descriptor. The image descriptor generation system then selects a secondary image descriptor from the subset of image descriptors and combines the primary image descriptor and the secondary image descriptor to form a composite image descriptor. Corresponding methods and systems are also disclosed.

SEMI-SUPERVISED PRICE TAG DETECTION
20180068180 · 2018-03-08 ·

A method comprising: training a price tag detector, comprising a gross feature detector and a classifier, to automatically detect a price tag in an image, by: a) training the gross feature detector using supervised learning with labeled images, and b) training the classifier using a two-phase hybrid learning process comprising: c) applying an initial supervised learning using the labeled images, yielding a semi-trained version of the classifier, and d) applying a subsequent unsupervised learning using unlabeled images, yielding a fully trained version of the classifier, wherein applying the unsupervised learning comprises: for each unlabeled image: i) detecting multiple price tag hypotheses using the gross feature detector, ii) classifying each price tag hypothesis using the semi-trained classifier, ii) rating each classification based contextual data extracted from the unlabeled image, iv) retraining the semi-trained classifier with the rated classifications, and repeating steps ii) through iv) until the reclassification converges.

Semi-supervised price tag detection

A method comprising: training a price tag detector, comprising a gross feature detector and a classifier, to automatically detect a price tag in an image, by: a) training the gross feature detector using supervised learning with labeled images, and b) training the classifier using a two-phase hybrid learning process comprising: c) applying an initial supervised learning using the labeled images, yielding a semi-trained version of the classifier, and d) applying a subsequent unsupervised learning using unlabeled images, yielding a fully trained version of the classifier, wherein applying the unsupervised learning comprises: for each unlabeled image: i) detecting multiple price tag hypotheses using the gross feature detector, ii) classifying each price tag hypothesis using the semi-trained classifier, ii) rating each classification based contextual data extracted from the unlabeled image, iv) retraining the semi-trained classifier with the rated classifications, and repeating steps ii) through iv) until the reclassification converges.

Salience-aware cross-attention for abstractive summarization

A method including: receiving an input comprising natural language texts at an encoder; adding a token to the input; obtaining a last-layer hidden state as a natural language text representation; feeding the natural language text representation into a single-layer classification head; predicting a salience allocation based on the single-layer classification head; developing a salience-aware cross-attention (SACA) decoder to determine salience in the natural language text representation; mapping a plurality of salience degrees to a plurality of trainable salience embeddings; estimating an amount of signal to accept from the plurality of trainable salience embeddings; incorporating the salience allocation and the signal in a cross-attention layer model; and generating a summarization based on the SACA decoder and the cross-attention layer model.