Patent classifications
G06V30/155
SYSTEMS AND METHODS FOR SEPARATING LIGATURE CHARACTERS IN DIGITIZED DOCUMENT IMAGES
Embodiments disclosed herein provide for systems and methods of separating characters associated with ligatures in digitized documents. The systems and methods provide for a ligature detection engine configured to identify the ligatures, and a ligature processing engine configured to identify and remove the glyphs attaching the separate characters forming the ligature.
GLYPH-AWARE UNDERLINING OF TEXT IN DIGITAL TYPOGRAPHY
A glyph-aware method for underlining text in digital typography includes identifying first and second intersection coordinates where first and second bounds of an underline region of the text intersect with an outline path of a glyph in the text. Where such intersections occur, a portion of the outline path of the glyph between the first and second intersection coordinates is copied. First and second offset coordinates for the underline are determined by adding or subtracting an offset to the first and second intersection coordinates. A first underline outline path is constructed in the underline region, where the first underline outline path includes the copied of the outline path of the glyph between the first and second intersection coordinates. A display device renders an underline, at least partially, along the first underline outline path between the first and second offset coordinates in the underline region of the text.
INFORMATION PROCESSING APPARATUS AND NON-TRANSITORY COMPUTER READABLE MEDIUM
An information processing apparatus includes a processor. The processor is configured to receive first image data representing a document, and generate, by processing corresponding to appearance characteristics of the document, second image data not representing information of a deletion target out of information represented in the first image data but representing information other than the information of the deletion target.
Glyph-aware underlining of text in digital typography
A glyph-aware method for underlining text in digital typography includes identifying first and second intersection coordinates where first and second bounds of an underline region of the text intersect with an outline path of a glyph in the text. Where such intersections occur, a portion of the outline path of the glyph between the first and second intersection coordinates is copied. First and second offset coordinates for the underline are determined by adding or subtracting an offset to the first and second intersection coordinates. A first underline outline path is constructed in the underline region, where the first underline outline path includes the copied of the outline path of the glyph between the first and second intersection coordinates. A display device renders an underline, at least partially, along the first underline outline path between the first and second offset coordinates in the underline region of the text.
AUTOMATED SCREENING OF MEDICAL DATA
Methods are provided for expediting screening of medical data. In some methods, medical records are obtained from a requester. Each medical record includes a digitized image of a body region of a patient. For each image, a processor is used to: perform an image-quality check of the image; perform a character-recognition process to locate a character in the image; mask the character to obtain a masked image; perform an identification process on the masked image to identify the body region of the image; perform an analysis routine on the masked image to determine a screening score, the analysis routine corresponding to the identified body region; and, if the screening score is within a normal range for the analysis routine, generate a normal notification indicating that the image is a normal image within a healthy range for the identified body region. The normal notification is automatically transmitted to a requester.
Systems and methods for separating ligature characters in digitized document images
Embodiments disclosed herein provide for systems and methods of separating characters associated with ligatures in digitized documents. The systems and methods provide for a ligature detection engine configured to identify the ligatures, and a ligature processing engine configured to identify and remove the glyphs attaching the separate characters forming the ligature.
Handwriting Recognition for Receipt
An information processing method and apparatus are provided that performs operations including identifying, from an image obtained via an image capture device, at least one character string that is relevant in identifying information to be extracted from the image; defining an area, within the image, that includes information as an information extraction area, the information including a plurality of information elements; selecting a region within the defined area where the information to be extracted is expected to be present using a feature within the defined area; removing the feature from the selected region and correcting one or more errors associated with the information caused by the removal of the feature and extracting one or more alphanumeric characters from the corrected information, wherein the extracted one or more alphanumeric characters correspond to the elements of the information and are associated with a respective one of the at least one character strings.
DOCUMENT OPTICAL CHARACTER RECOGNITION
Vehicles and other items often have corresponding documentation, such as registration cards, that includes a significant amount of informative textual information that can be used in identifying the item. Traditional OCR may be unsuccessful when dealing with non-cooperative images. Accordingly, features such as dewarping, text alignment, and line identification and removal may aid in OCR of non-cooperative images. Dewarping involves determining curvature of a document depicted in an image and processing the image to dewarp the image of the document to make it more accurately conform to the ideal of a cooperative image. Text alignment involves determining an actual alignment of depicted text, even when the depicted text is not aligned with depicted visual cues. Line identification and removal involves identifying portions of the image that depict lines and removing those lines prior to OCR processing of the image.
OPTICAL CHARACTER RECOGNITION SUPPORT SYSTEM
A computer-implemented method for increasing a recognition rate of an optical character recognition (OCR) system is provided. The method includes preprocessing by receiving an image, and extracting all vertical lines from the image. The method includes adding vertical lines at character areas of the image, extracting all horizontal lines from the image, and creating an unlined image removing all the vertical/horizontal lines from the image. The method further includes determining a border of a vertical direction of the unlined image based on the total of pixels of rows in each column, and adding vertical/horizontal auxiliary lines between characters of the unlined image. The method also includes postprocessing by receiving garbled words of OCR output, removing noise after morphologically analyzing, replacing garbled letters with correct ones based on a frequent edit operation, and outputting the correct word, weighting results of image distance calculations based on machine learning.
Information processing apparatus, information processing system, and non-transitory computer readable medium
An information processing apparatus includes: a first extracting unit that extracts a position of a character entry box in an input image; a recognizing unit that recognizes a character string written in the character entry box; a calculating unit that calculates recognition accuracy of each of characters of the character string recognized by the recognizing unit; a first detector that detects that a value based on the recognition accuracy is equal to or larger than a preset threshold value; a second extracting unit that extracts a position of a circumscribed rectangle for each character of the character string in the input image; a second detector that detects contact of the circumscribed rectangle with the character entry box; and a display that displays the character string to be corrected on the basis of a result of detection by the first detector and a result of detection by the second detector.