G06V30/26

TEXT EXTRACTION USING OPTICAL CHARACTER RECOGNITION

Provided herein are systems and methods for extracting text from a document. Different optical character recognition (OCR) tools are used to extract different versions of the text in the document. Metrics evaluating the quality of the extracted text are compared to identify and select higher quality extracted text. A selected portion of text is compared to a threshold to ensure minimal quality. The selected portion of text is then saved. Error correction can be applied to the selected portion of text based on errors specific to the OCR tools or the document contents.

TEXT EXTRACTION USING OPTICAL CHARACTER RECOGNITION

Provided herein are systems and methods for extracting text from a document. Different optical character recognition (OCR) tools are used to extract different versions of the text in the document. Metrics evaluating the quality of the extracted text are compared to identify and select higher quality extracted text. A selected portion of text is compared to a threshold to ensure minimal quality. The selected portion of text is then saved. Error correction can be applied to the selected portion of text based on errors specific to the OCR tools or the document contents.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM
20230368555 · 2023-11-16 ·

The user interface screen for inputting property information on a scanned image includes an input field to which information is input automatically based on results of character recognition processing performed for a character area included within the scanned image and in a case where the results of the character recognition processing are a numerical value, information that is input automatically to the input field is a numerical value after the numerical value of the results of the character recognition processing is changed in accordance with a predetermined interpretation rule of numerical value representation.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM
20230368555 · 2023-11-16 ·

The user interface screen for inputting property information on a scanned image includes an input field to which information is input automatically based on results of character recognition processing performed for a character area included within the scanned image and in a case where the results of the character recognition processing are a numerical value, information that is input automatically to the input field is a numerical value after the numerical value of the results of the character recognition processing is changed in accordance with a predetermined interpretation rule of numerical value representation.

CHARACTER INPUT DEVICE, CHARACTER INPUT METHOD, AND COMPUTER-READABLE STORAGE MEDIUM STORING A CHARACTER INPUT PROGRAM
20220215681 · 2022-07-07 · ·

A first character string obtainment unit according to one or more embodiments may obtain a first character string in response to an input character string that has been input. A similar character extraction unit extracts similar characters having similar shapes as characters in the first character string. A second character string generation unit generates one or more second character strings in which some or all of the characters in the first character string are replaced with similar characters extracted by the similar character extraction unit. Then, a conversion candidate output unit outputs the first character string and the second character strings as conversion candidates for the input character string.

HANDWRITING INPUT DISPLAY APPARATUS, HANDWRITING INPUT DISPLAY METHOD AND RECORDING MEDIUM STORING PROGRAM
20210303836 · 2021-09-30 ·

A handwriting input display apparatus causes display means to display a stroke generated by an input made by using input means to a screen as a handwritten object. The apparatus includes display control means for causing the display means to display character string candidates including a handwriting recognition candidate when the handwritten object does not change for a predetermined time. When the handwriting recognition candidate is selected, the display control means causes the display means to erase a display of the character string candidates and a display of the handwritten object, and causes the display means to display a character string object at a position where the erased handwritten object was displayed. When selection of the handwriting recognition candidate is not performed for a predetermined time and the display of the character string candidates is erased, the display control means causes the handwritten object to be kept displayed.

Multi-modal learning based intelligent enhancement of post optical character recognition error correction

A mechanism is provided for implementing an optical character recognition (OCR) error correction mechanism for correcting OCR errors. Responsive to receiving a document in which OCR has been performed, the mechanism assesses the document to identify a set of OCR errors generated by an OCR engine that performed the OCR using a set of visual embeddings. Responsive to identifying the set of OCR errors, the mechanism analyzes each character of a plurality of sentences within the document to generate a high-dimensional embedding for the characters of the plurality of sentences within the document. The mechanism then linguistically corrects each OCR error in the set of OCR error. The mechanism utilizes ground truth information and the set of visual embeddings to verify that character stream is linguistically correct. Responsive to verifying that the character stream is linguistically correct, the mechanism outputs an OCR error corrected document to a user.

Multi-modal learning based intelligent enhancement of post optical character recognition error correction

A mechanism is provided for implementing an optical character recognition (OCR) error correction mechanism for correcting OCR errors. Responsive to receiving a document in which OCR has been performed, the mechanism assesses the document to identify a set of OCR errors generated by an OCR engine that performed the OCR using a set of visual embeddings. Responsive to identifying the set of OCR errors, the mechanism analyzes each character of a plurality of sentences within the document to generate a high-dimensional embedding for the characters of the plurality of sentences within the document. The mechanism then linguistically corrects each OCR error in the set of OCR error. The mechanism utilizes ground truth information and the set of visual embeddings to verify that character stream is linguistically correct. Responsive to verifying that the character stream is linguistically correct, the mechanism outputs an OCR error corrected document to a user.

METHODS, SYSTEMS, APPARATUS AND ARTICLES OF MANUFACTURE FOR RECEIPT DECODING

Methods, apparatus, systems and articles of manufacture are disclosed for receipt decoding. An example apparatus for processing a receipt associated with a user disclosed herein includes an optical character recognition engine to generate bounding boxes, respective ones of the bounding boxes associated with groups of characters detected in the receipt, the bounding boxes including a first bounding box, a second bounding box and a third bounding box, a word connector to connect the first bounding box to the second bounding box based on (1) an adjacency of the first bounding box to the second bounding box and (2) a difference value from a comparison of a location of the first bounding box to a location of the second bounding box, a line connector to form a line of the ones of the bounding boxes by connecting the third bounding box to the second bounding based on a relationship between the first bounding box and the second bounding box, the line of the ones of the bounding boxes indicative of related receipt fields, and a creditor to generate a report based on the line.

Character restoration method and apparatus, storage medium, and electronic device
11902522 · 2024-02-13 · ·

A character restoration method and apparatus, a storage medium, and an electronic device are provided. The character restoration method includes: a character identifier of a character in a text region is determined, where the character identifier is used for uniquely identifying the character; and encoding is performed at least according to the character identifier, and encoded data is sent to a receiving end, where the encoded data is used for the receiving end to decode the encoded data and restore the character according to the character identifier obtained after decoding, that is, encoding is performed merely according to a small amount of information, and then the information is obtained by decoding, so as to restore the character.