G06V30/133

SYSTEMS AND METHODS FOR MEASURING DOCUMENT LEGIBILITY
20240073347 · 2024-02-29 ·

Disclosed embodiments may include a system for measuring document legibility. The system may automatically receive document image data from a user device. The system may then process the image data using optical character recognition to create language data containing a plurality of words. The system may then obtain an overall number by counting the plurality of words in the language data. The system may then identify and count the common words within the plurality of words by comparing the plurality of words to words in a database. A score may be obtained by dividing the common word number by the overall number. The score may then be compared to a legibility threshold. If the score is below the threshold, the system may determine the document is illegible. If the score is above the threshold, the system may determine the document is legible.

SYSTEMS AND METHODS FOR BLUR IDENTIFICATION AND CORRECTION

Methods and systems are described herein for identifying the location and nature of any blur within one or more images received as a user communication and generating an appropriate correction. The system utilizes a first machine learning model, which is trained to identify blurred components of inputted images and determine whether the blurred components are located in portions of the inputted images comprising textual information. The system may apply a corrective action selected by the first machine learning model, which may comprise stitching blurred images together to a sharp product image and/or some other method appropriate for rectifying images received.

METHOD OF CLASSIFYING A DOCUMENT FOR A STRAIGHT-THROUGH PROCESSING

Disclosed is a method of classifying a document for a straight-through processing (STP) using memory enabled modelling. The method includes receiving, performing a content extraction from the input image, and selecting a template with the highest matching probability from a database. The method further includes postprocessing the extracted content based on the predicted template and validating the extracted content. Thereafter, the method includes classifying the document for the STP if the extracted content is successfully validated corresponding to each of the image quality validation, the refined content extraction validation, and the layout validation.

Mechanical handwriting quality control method
11964497 · 2024-04-23 ·

The invention comprises an apparatus and method of use thereof for processing a machine produced emulated handwritten document prepared on a marking surface, comprising the steps of: receiving a print job order; digitally generating a reference image of the print job order; machine plotting with a plotting pen and a downward force of one-half to forty ounces applied to said plotting pen an indentation trail to form input text on the marking surface; digitally imaging the input text on the marking surface to form an actual image; and in a quality control step, digitally comparing the reference image to the actual image prior to insertion of the print job order into an envelope.

Text extraction using optical character recognition

Provided herein are systems and methods for extracting text from a document. Different optical character recognition (OCR) tools are used to extract different versions of the text in the document. Metrics evaluating the quality of the extracted text are compared to identify and select higher quality extracted text. A selected portion of text is compared to a threshold to ensure minimal quality. The selected portion of text is then saved. Error correction can be applied to the selected portion of text based on errors specific to the OCR tools or the document contents.

Information processing system and reading device for acquiring target content information from recording medium and outputting information about erroneous recording medium
11954927 · 2024-04-09 · ·

An information processing system includes a processor configured to: make an attempt to acquire, from recording medium images obtained by reading a plurality of recording media, target content information recorded on each of the plurality of recording media; and among the plurality of recording media, output information about an erroneous recording medium, the erroneous recording medium being (a) a recording medium of which the attempted acquisition of the target content information has been unsuccessful or (b) a recording medium of which the target content information acquired by the attempted acquisition does not satisfy a predetermined condition.

Systems and methods for measuring document legibility
11956400 · 2024-04-09 · ·

Disclosed embodiments may include a system for measuring document legibility. The system may automatically receive document image data from a user device. The system may then process the image data using optical character recognition to create language data containing a plurality of words. The system may then obtain an overall number by counting the plurality of words in the language data. The system may then identify and count the common words within the plurality of words by comparing the plurality of words to words in a database. A score may be obtained by dividing the common word number by the overall number. The score may then be compared to a legibility threshold. If the score is below the threshold, the system may determine the document is illegible. If the score is above the threshold, the system may determine the document is legible.

AN IMAGE PROCESSING METHOD AND AN IMAGE PROCESSING SYSTEM
20190266447 · 2019-08-29 ·

An image processing method for recognising characters included in an image. A first character recognition unit performs recognition of a first group of characters corresponding to a first region of the image. A measuring unit calculates a confidence measure of the first group of characters. A determination unit determines whether further recognition is to be performed based on the confidence measure. A selection unit selects a second region of the image that includes the first region, if it is determined that further recognition is to be performed. A second character recognition unit performs further recognition of a second group of characters corresponding to the second region of the image.

Method for assessing the quality of an image of a document
10395393 · 2019-08-27 · ·

A comprising: processing the image to a text image with a number of text blobs; classifying the text blobs based on a calculation as to whether they will belong to a foreground layer or to a background layer in OCR processing; and generating a quality value of the image based on the classified text blobs. By generating the quality value based on the classified text blobs, pictures in the image, which are not relevant for OCR are not taken into account for assessing the quality of the image. The amount of data to be processed is thereby decreased resulting in a method which can be executed in real-time. Furthermore, as the quality assessment criterion is based on the division of blobs into a foreground and a background layer, i.e. on prior knowledge of the OCR system, it provides a good indication for OCR accuracy.

AUTOMATED CATEGORIZATION AND PROCESSING OF DOCUMENT IMAGES OF VARYING DEGREES OF QUALITY
20240161522 · 2024-05-16 ·

An apparatus includes a memory and a processor. The memory stores a dictionary and a machine learning algorithm trained to classify text. The processor receives an image of a page, converts the image into a set of text, and identifies a plurality of tokens within the text. Each token includes one or more contiguous characters that are both preceded and followed by whitespace within the text. The processor identifies invalid tokens by removing tokens of the plurality of tokens that correspond to words of the dictionary. The processor calculates, based on a ratio of a total number of valid tokens to a total number of tokens, a score. In response to determining that the score is greater than a threshold, the processor applies the machine learning algorithm to classify the text into a category and stores the image and/or text in a database according to the category.