G06V30/19147

Image processing apparatus, image processing method, and storage medium
11568623 · 2023-01-31 · ·

An image processing apparatus obtains a read image of a document including a handwritten character, generates a first image formed by pixels of the handwritten character by extracting the pixels of the handwritten character from pixels of the read image using a first learning model for extracting the pixels of the handwritten character, estimates a handwriting area including the handwritten character using a second learning model for estimating the handwriting area, and performs handwriting OCR processing based on the generated first image and the estimated handwriting area.

Automatic generation of training data for hand-printed text recognition

A method for generating training data for hand-printed text recognition includes obtaining a structured document, obtaining a set of hand-printed character images and database metadata from a database, generating a modified document page image, and outputting a training file. The structured document includes a document page image that includes text characters and document metadata that associates each of the text characters to a document character label. The database metadata associates each of the set of hand-printed character images to a database character label. The modified document page image is generated by iteratively processing each of the text characters. The iterative processing includes determining whether an individual text character should be replaced, selecting a replacement hand-printed character image from the set of hand-printed character images, scaling the replacement hand-printed character image, and inserting the replacement hand-printed character image into the modified document page image.

ASPECT PROMPTING FRAMEWORK FOR LANGUAGE MODELING

Techniques for dynamically developing a contextual set of prompts based on relevant aspects extracted from s set of training data. One technique includes obtaining training data comprising text examples and associated labels, extracting aspects from the training data, generating prompting templates based on the training data and the extracted aspects, concatenating each of the text examples with the respective generated prompting template to create prompting functions, training a machine learning language model on the prompting functions to predict a solution for a task, where the training is formulated as a masked language modeling problem with blanks of the prompting templates being set as text labels and expected output for the task being set as specified solution labels, and the training learns or updates model parameters of the machine learning language model for performing the task. The machine learning language model is provided with the learned or updated model parameters.

ARCHITECTURE FOR ML DRIFT EVALUATION AND VISUALIZATION
20230025677 · 2023-01-26 ·

Systems, devices, methods, and computer-readable media for evaluation and visualization of machine learning data drift. A method can include receiving a series of data indicating accuracy and confidence associated with classification of respective batches of input samples, and dynamically displaying, on the GUI, a concurrent plot of the accuracy and confidence as the series of data are received.

DESIGN OPTIMIZATION AND USE OF CODEBOOKS FOR DOCUMENT ANALYSIS
20230028992 · 2023-01-26 ·

A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.

Phrase recognition model for autonomous vehicles

Aspects of the disclosure relate to training and using a phrase recognition model to identify phrases in images. As an example, a selected phrase list may include a plurality of phrases is received. Each phrase of the plurality of phrases includes text. An initial plurality of images may be received. A training image set may be selected from the initial plurality of images by identifying the phrase-containing images that include one or more phrases from the selected phrase list. Each given phrase-containing image of the training image set may be labeled with information identifying the one or more phrases from the selected phrase list included in the given phrase-containing images. The model may be trained based on the training image set such that the model is configured to, in response to receiving an input image, output data indicating whether a phrase of the plurality of phrases is included in the input image.

TREND PREDICTION
20230230110 · 2023-07-20 ·

Predicting trends may include obtaining trend data from one or more sources, extracting a plurality of trends from the trend data, and producing permutations combining terms or concepts appearing in the plurality of trends to create trend candidates. A first term from a first trend or concept in the plurality of trends may be combined with a second term or concept from a second trend in the plurality of trends.

TREND PREDICTION
20230230109 · 2023-07-20 ·

Predicting trends may include obtaining trend data from two or more sources, extracting meaning from the trend data including meaning from a plurality of trends, and grouping trends from the plurality of trends such that trends that have equivalent meaning but not identical expression are grouped together as an aggregated trend.

OBJECT DETECTION USING NEURAL NETWORKS

Systems and methods for facilitating an automated detection of an object in a test document are disclosed. A system may include a processor including a dataset generator. The dataset generator may obtain a first input image and a first original document from a data lake. The dataset generator may prune a portion of the first original document to obtain a pruned image. The dataset generator may blend the first input image with the pruned image to generate a modified image. The modified image may include the pruned image bearing the first pre-defined representation. The modified image may be combined with the first original document to generate a training dataset. The training dataset may be utilized to train a neural network based model to obtain a trained model for the automated detection of the object in the test document.

Enhanced supervised form understanding

Interfaces and systems are provided for harvesting ground truth from forms to be used in training models based on key-value pairings in the forms and to later use the trained models to identify related key-value pairings in new forms. Initially, forms are identified and clustered to identify a subset of forms to label with the key-value pairings. Users provide input to identify keys to use in labeling and then select/highlight text from forms that are presented concurrently with the keys in order to associate the highlighted text with the key(s) as the corresponding key-value pairing(s). After labeling the forms with the key-value pairings, the key-value pairing data is used as ground truth for training a model to independently identify the key-value pairing(s) in new forms. Once trained, the model is used to identify the key-value pairing(s) in new forms.