G06V30/2455

Automatic generation of training data for hand-printed text recognition

A method for generating training data for hand-printed text recognition includes obtaining a structured document, obtaining a set of hand-printed character images and database metadata from a database, generating a modified document page image, and outputting a training file. The structured document includes a document page image that includes text characters and document metadata that associates each of the text characters to a document character label. The database metadata associates each of the set of hand-printed character images to a database character label. The modified document page image is generated by iteratively processing each of the text characters. The iterative processing includes determining whether an individual text character should be replaced, selecting a replacement hand-printed character image from the set of hand-printed character images, scaling the replacement hand-printed character image, and inserting the replacement hand-printed character image into the modified document page image.

HANDWRITTEN POSTAGE

The technology described herein provides a handwritten postage that comprises handwriting on a postal item that forms a unique identifier for the postal item (e.g., envelope, postcard, sticker) when analyzed by a computer vision application. The unique identifier is computer derived from the handwritten postage and allows one instance of handwritten postage to be differentiated from all other instances of handwritten postage. The unique identifier may be derived from an image of an envelope that includes an instance of handwritten postage when the handwritten postage is activated. The unique identifier may be formed from a combination of handwriting content (e.g., to and from address), metadata (e.g., date activated), pre-printed content on the postal item (e.g., fiducial marks), post-printed content (e.g., to or from address) and the visual image created by all or a portion of the handwriting. Postage value is added to the handwritten postage through an activation process.

HANDWRITTEN CONTENT REMOVING METHOD AND DEVICE AND STORAGE MEDIUM
20230037272 · 2023-02-02 · ·

A handwritten content removing method and device and a storage medium. The handwritten content removing method comprises: acquiring an input image of a text page to be processed, the input image comprising a handwritten region, which comprises a handwritten content (S10); identifying the input image so as to determine the handwritten content in the handwritten region (S11); and removing the handwritten content in the input image so as to obtain an output image (S12).

METHODS AND SYSTEMS FOR PERFORMING ON-DEVICE IMAGE TO TEXT CONVERSION

A method and system for performing on-device image to text conversion are provided. Embodiments herein relates to the field of performing image to text conversion and more particularly to performing on-device image to text conversion with an improved accuracy. A method performing on-device image to text conversion is provided. The method includes language detection from an image, understanding of text in an edited image and using a contextual and localized lexicon set for post optical character recognition (OCR) correction.

Utilizing machine learning and image filtering techniques to detect and analyze handwritten text

In some implementations, a device may receive an image that depicts handwritten text. The device may determine that a section of the image includes the handwritten text. The device may analyze, using a first image processing technique, the section to identify subsections of the section that include individual words of the handwritten text. The device may reconfigure, using a second image processing technique, the subsections to create preprocessed word images associated with the individual words. The device may analyze, using a word recognition model, the preprocessed word images to generate digitized words that are associated with the preprocessed word images. The device may verify, based on a reference data structure, that the digitized words correspond to recognized words of the word recognition model. The device may generate, based on verifying the digitized words, digital text according to a sequence of the digitized words in the section.

Detecting machine text
11468232 · 2022-10-11 · ·

System receives historical text block, creates historical features for historical text block's historical text lines. System trains machine-learning model to cluster historical features into historical features clusters based on their similarities. System identifies historical features cluster as historical human text cluster. System classifies each historical text line for historical human text cluster as human text, and each historical text line for other historical features clusters as machine text. System receives text block, creates features for text block's text lines. System applies trained machine-learning model to cluster features into features clusters based on their similarities. System identifies features cluster as human text cluster. System classifies each text line for human text cluster as human text, and each text line for other features clusters as machine text. System applies human text analysis to each text line classified as human text and machine text analysis to each text line classified as machine text.

METHOD, APPARATUS, AND SYSTEM FOR AUTO-REGISTRATION OF NESTED TABLES FROM UNSTRUCTURED CELL ASSOCIATION FOR TABLE-BASED DOCUMENTATION
20220318235 · 2022-10-06 ·

In some forms containing keywords and content, there may be nested levels of keywords, also referred to as a hierarchy. Content in the forms may be associated with one or more keywords in one or more of the nested levels, or in the hierarchy. Identifying keywords in adjacent cells in a table (with a nested keyword being either to the right of or below another keyword) enables distinguishing between keywords and content in filled forms, and enables correct association of content with respective keywords.

Information processing apparatus, non-transitory computer readable medium, and character recognition system
11659106 · 2023-05-23 · ·

An information processing apparatus includes a processor configured to acquire a result of character recognition of a character string formed on a medium and read by scanning that is subject to character recognition and replace a character or a symbol in a subject with a reference character string that is referred to by the character or the symbol.

AUTOMATIC GENERATION OF TRAINING DATA FOR HAND-PRINTED TEXT RECOGNITION

A method for generating training data for hand-printed text recognition includes obtaining a structured document, obtaining a set of hand-printed character images and database metadata from a database, generating a modified document page image, and outputting a training file. The structured document includes a document page image that includes text characters and document metadata that associates each of the text characters to a document character label. The database metadata associates each of the set of hand-printed character images to a database character label. The modified document page image is generated by iteratively processing each of the text characters. The iterative processing includes determining whether an individual text character should be replaced, selecting a replacement hand-printed character image from the set of hand-printed character images, scaling the replacement hand-printed character image, and inserting the replacement hand-printed character image into the modified document page image.

Processing digitized handwriting

A handwritten text processing system processes a digitized document including handwritten text input to generate an output version of the digitized document that allows users to execute text processing functions on the textual content of the digitized document. Each word of the digitized data is extracted by converting the digitized document into images, binarizing the images, and segmenting the images into binary image patches. Each binary image patch is further processed to identify if the word is machine-generated or if the word is handwritten. The output version is generated by combining underlying images of the pages of the digitized document with words from the pages superimposed in a transparent font at positions that coincide with the positions of the words in the underlying images.