G06V30/158

Systems and methods for separating ligature characters in digitized document images
11710331 · 2023-07-25 · ·

Embodiments disclosed herein provide for systems and methods of separating characters associated with ligatures in digitized documents. The systems and methods provide for a ligature detection engine configured to identify the ligatures, and a ligature processing engine configured to identify and remove the glyphs attaching the separate characters forming the ligature.

Text line normalization systems and methods

A method for estimating text heights of text line images includes estimating a text height with a sequence recognizer. The method further includes normalizing a vertical dimension and/or position of text within a text line image based on the text height. The method may also further include calculating a feature of the text line image. In some examples, the sequence recognizer estimates the text height with a machine learning model.

LINE ITEM DETECTION IN BORDERLESS TABULAR STRUCTURED DATA

In an approach, a processor identifies a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table. A processor classifies the plurality of text separators into a number of target clusters comprised in a target group based on property information related to the plurality of text separators, the number of target clusters corresponding to a number of separator types. A processor provides indication information to indicate respective separator types of the plurality of text separators based on a result of the classifying.

IDENTIFY CARD NUMBER
20230215201 · 2023-07-06 ·

A card number recognition method and apparatus, a storage medium, and an electronic device are disclosed. The method includes: obtaining distribution format information of character bits of a card number sequence, where the distribution format information includes character bit spacing information of the card number sequence; recognizing a character sequence in a target image through a neural network model trained in advance, and obtaining character bit spacing information of the recognized character sequence; determining whether the character bit spacing information of the recognized character sequence is consistent with the character bit spacing information in the obtained distribution format information; and if the character bit spacing information of the character sequence is consistent with the character bit spacing information in the obtained distribution format information, determining that the recognized character sequence is target card numbers.

IMAGE PROCESSING SYSTEM AND IMAGE PROCESSING METHOD
20230029990 · 2023-02-02 ·

An image processing system according to the present embodiment acquires a processing target image read from an original that is handwritten and specifies one or more handwritten areas included in the acquired processing target image. In addition, for each specified handwritten area, the present image processing system extracts from the processing target image a handwritten character image and a handwritten area image indicating an approximate shape of a handwritten character. Furthermore, for a handwritten area including a plurality of lines of handwriting among the specified one or more handwritten areas, a line boundary of handwritten characters is determined from a frequency of pixels indicating a handwritten area in a line direction of the handwritten area image, and a corresponding handwritten area is separated into each line.

Method and system for segmenting touching text lines in image of uchen-script Tibetan historical document

A method and system for segmenting touching text lines in an image of a uchen-script Tibetan historical document are provided. The method includes: first obtaining a binary image of a uchen-script Tibetan historical document after layout analysis; detecting local baselines in the binary image, to generate a local baseline information set; detecting and segmenting a touching region in the binary image according to the local baseline information set, to generate a touching-region-segmented image; allocating connected components in the touching-region-segmented image to corresponding lines, to generate a text line allocation result; and splitting text lines in the touching-region-segmented image according to the text line allocation result, to generate a line-segmented image. In the present disclosure, touching text lines in a Tibetan historical document can be effectively segmented, and text line segmentation efficiency of the Tibetan historical document is improved.

PICTURE SEARCH METHOD AND APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIUM
20230082638 · 2023-03-16 · ·

A picture search method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product, relating to the field of artificial intelligence that can obtain an OCR result of pictures in a preset picture library in response to a picture search request; traverse pictures which are not subjected to low-dimensional OCR processing and high-dimensional OCR processing in the preset picture library, and perform the low-dimensional OCR processing based on an OCR threshold on each of the traversed pictures to obtain a low-dimensional OCR result of each corresponding picture; determining a target picture matching a key character string in the preset picture library according to at least one of the low-dimensional OCR result and the high-dimensional OCR result of each picture; and determining the target picture as a search result of the picture search request, and displaying the search result.

CONTENT RECOGNITION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM

A method for content recognition includes acquiring, from a content for recognition, a text piece and a media piece associated with the text piece, performing a first feature extraction on the text piece to obtain text features, performing a second feature extraction on the media piece associated with the text piece to obtain media features, and determining feature association measures between the media features and the text features. A feature association measure for a first feature in the media features and a second feature in the text features indicating an association degree between the first feature and the second feature. The method further includes adjusting the text features based on the feature association measures to obtain adjusted text features, and performing a recognition based on the adjusted text features to obtain a content recognition result of the content. Apparatus and non-transitory computer-readable storage medium counterpart embodiments are also contemplated.

SERIAL NUMBER RECOGNITION PARAMETER DETERMINATION APPARATUS, SERIAL NUMBER RECOGNITION PARAMETER DETERMINATION PROGRAM, AND PAPER SHEET HANDLING SYSTEM
20230186663 · 2023-06-15 · ·

A serial number recognition parameter determination apparatus includes: a generation unit, an identification unit, and an evaluation index calculation unit. The generation unit generates a parameter set of a program, the program being used when a paper sheet handing apparatus identifies, from an image of a paper sheet, character present regions in which characters that form a serial number are present. The identification unit identifies, from an image of the paper sheet, the character present regions by using the parameter set that is generated by the generation unit. The evaluation index calculation unit calculates an evaluation index of the parameter set based on the character present regions that are identified by the identification unit.

Translation device that determines whether two consecutive lines in an image should be translated together or separately

A condition determining section (24) determines whether or not two consecutive lines in an image meet a joining condition that is based on a characteristic of a language of a character string, the two consecutive lines being extracted from the character string composed of a plurality of lines. In a case where the joining condition is met, an extracted line joining section (25) and a translation section (26) join and then translate the two consecutive lines.