Patent classifications
G06V30/1918
RECOGNIZING ANIMATION USING OPTICAL CHARACTER RECOGNITION
A method, computer program product, and computer system are provided for recognizing animation through optical character recognition (OCR). Data corresponding to a video or animation containing one or more frames is received. Individual frames or representative images are extracted from the received data. Text in the extracted individual frames or representative images is recognized based on performing optical character recognition on the extracted individual frames or representative images. Relationships between the recognized text across multiple frames of the one or more frames are recognized. A textual or graphical representation of the video or animation is generated based on the identified relationships.
System and method for assisting in computer interpretation of surfaces carrying symbols or characters
The present disclosure relates to image processing and analysis and in particular automatic segmentation of identifiable items in an image, for example the segmentation and identification of characters or symbols in an image. Upon user indication, multiple images of a subject are captured and variations between the images are created using lighting, spectral content, angles and other factors. The images are processed together so that characters and symbols may be recognized from the surface of the image subject.
PROCESSING METHOD AND APPARATUS AND ELECTRONIC DEVICE
A processing method includes obtaining character information, the character information being used to represent a search target, obtaining an image set, the image set including a plurality of images, and based on the character information, the image set, and an intelligent engine, obtaining an image search result including. Based on the character information and a first model in the intelligent engine, a first set is obtained. The first set includes a plurality of first images. Based on the first set, the image set, and a second model in the intelligent engine, a second set is obtained. The second set includes a plurality of second images, the second images is used as image search results, and the first model is different from the second model.
DYNAMIC DOCUMENT CLASSIFICATION
In an approach, a processor performs document layout analysis on a document generating a plurality of textual regions; extracts characteristics from each of the plurality of textual regions and associates the respective characteristics to the respective textual region as metadata; classifies each of the plurality of textual regions as an optical character recognition (OCR) region, non-OCR valuable region, or non-OCR non-valuable region using a classifier; performs OCR on each OCR region generating an OCR output; identifies associated constant OCR data from a constant OCR data repository for each non-OCR valuable region; merges the associated constant OCR data with the OCR output generating a complete OCR data for the received document; performs data extraction on the complete OCR data to identify data fields and key-value pairs generating extracted data; and determines whether the extracted data is valid based on a set of rules.
EXTENSIBLE ARCHITECTURE WITH MULTIMODAL FEATURE FUSION FOR DOCUMENT CLASSIFICATION
Methods and systems are presented for classifying a digital image of a document using a machine learning model framework. The machine learning model framework is configured to provide a classification output based on a fusion of features corresponding to different modalities and extracted from the digital image. The machine learning model framework includes multiple encoders. Each encoder is configured to encode features corresponding to a distinct modality into a respective embedding. Different embeddings generated by the multiple encoders are fused together using one or more fusion techniques. The fused embedding is provided to a machine learning model for classifying the document.
VEHICLE LICENSE PLATE RECOGNITION METHOD, DEVICE, TERMINAL AND COMPUTER-READABLE STORAGE MEDIUM
A vehicle license plate recognition method includes: performing the vehicle license plate recognition on the obtained fisheye image to obtain the vehicle plate region; in response to the vehicle license plate region being in the non-reference direction, enlarging the vehicle license plate region based on all pixels of the vehicle license plate region to obtain the deformed image of the vehicle license plate; the resolution of the deformed image of the vehicle license plate being higher than the resolution of the vehicle license plate region; reference direction correction is performed on the deformed image of the vehicle license plate to obtain the to-be-detected image; and recognizing the to-be-detected image to obtain the output character corresponding to the vehicle license plate region.
Method and device for training, based on crossmodal information, document reading comprehension model
A method for training a document reading comprehension model includes: acquiring a question sample and a rich-text document sample, in which the rich-text document sample includes a real answer of the question sample; acquiring text information and layout information of the rich-text document sample by performing OCR processing on image information of the rich-text document sample; acquiring a predicted answer of the question sample by inputting the text information, the layout information and the image information of the rich-text document sample into a preset reading comprehension model; and training the reading comprehension model based on the real answer and the predicted answer. The method may enhance comprehension ability of the reading comprehension model to the long rich-text document, and save labor cost.
Inter-word score calculation apparatus, question and answer extraction system and inter-word score calculation method
An inter-word score calculation apparatus calculates a degree of relatedness between words included in an amount of data from at least one document. The inter-word score calculation apparatus includes a memory storing document data from the documents, term list data wherein predetermined terms are written and a processor. The processor performs a combination process of amplifying an amplification candidate word, which is a word corresponding to a term in the term list data and included in the document data, creating an amplified word, and adding the amplified word to the document data creating processed document data, calculate the degree of relatedness between words included in the processed document data using a predetermined calculation method, and when an amount of documents accumulated in the document data is smaller than a first predetermined amount, add the amplification candidate word to the processed document data.
Devices and Methods for Enhancing Data Extraction from Images
Systems and methods for enhancing trainable optical character recognition (OCR) performance are disclosed herein. An example method includes receiving, at an application executing on a user computing device communicatively coupled to a machine vision camera, an image captured by the machine vision camera, the image including an indicia encoding a payload and a character string. The example method also includes identifying the indicia and the character string; decoding the indicia to determine the payload; and applying an optical character recognition (OCR) algorithm to the image to interpret the character string and identify an unrecognized character within the character string. The example method also includes comparing the payload to the character string to validate the unrecognized character as corresponding to a known character included within the payload; and responsive to validating the unrecognized character, adding the unrecognized character to a font library referenced by the OCR algorithm.
CHARACTER PROCESSING METHOD AND APPARATUS, AND ELECTRONIC DEVICE AND STORAGE MEDIUM
A character processing method and apparatus, an electronic device and a storage medium. The method includes: acquiring a first image comprising a to-be-processed character; training a target stroke order determination model by combining with a spatial attention mechanism and a channel attention mechanism; and inputting the first image into the target stroke order determination model trained in advance to obtain a target stroke order corresponding to the to-be-processed character.