G06V30/1463

PORTABLE DOCUMENT FORMAT (PDF) DOCUMENT PROCESSING METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM

A Portable Document Format (PDF) document processing method, in the fields of natural language processing and computer vision in artificial intelligence, includes: performing Optical Character Recognition (OCR) on a PDF document in image form to obtain recognized target results and first coordinate information for each target result, where each target result includes one character or at least two consecutive characters forming a character segment; for each target result, converting the first coordinate information into corresponding PDF coordinates and using the PDF coordinates as second coordinate information; determining a target font library for rewriting based on one or more characters in each target result; rewriting the one or more characters from each target result into the PDF document based on the second coordinate information and the target font library, to obtain a desired target document.

Processing multiple documents in an image

Disclosed are various embodiments for processing multiple documents in an image. First, text from each of one or more documents in an image can be identified. An orientation of each of the one or more documents can be determined based at least in part on an alignment of the text of each of the one or more documents. Additionally, using an object detection model, the one or more documents in the image can be identified based at least in part on the orientation of each of the one or more documents. Finally, the one or more documents can be separated from the image into one or more separate image files, each separate image file representing a respective document of the one or more documents.

Apparatus, method, and computer-readable storage medium for determining a rotation angle of text
09552527 · 2017-01-24 · ·

An apparatus, method, and computer-readable storage medium for determining a rotation angle of text. The method includes computing, for each object of a plurality of objects included in text within an image, a distance to a closest neighboring object, computing an average distance of the distances to the closest neighboring objects, determining a ratio between the average distance and an average font stroke width, the average font stroke width being an average of a font stroke width of each of the plurality of objects, and determining a rotation angle of the text by comparing the ratio to a threshold value.

MOBILE CHECK DEPOSIT SYSTEM AND METHOD

A computer-implemented method is provided for a mobile device to detect, by a camera of the mobile device, a plurality of checks; determine, by a processing unit of the mobile device, that the image of the plurality of checks is of sufficient quality; instruct, by a display of the mobile device, a user to take a photograph of the plurality of checks; crop, by the processing unit, the photograph of the plurality of checks into a plurality of images, wherein each of the plurality of images contains one of the plurality of checks; and transmit, by a transmitter of the mobile device, the plurality of images to a server via a network. The plurality of images may be transmitted individually (i.e., one at time), or alternatively, collectively and in one payload.

SYSTEM AND METHOD TO CREATE MACHINE-READABLE CODE USING OPTICAL CHARACTER RECOGNITION

A system to create a machine-readable code includes a digital camera; a computer including a processor communicatively coupled to the digital camera; and a monitor to display a graphical user interface provided by the computer, wherein the processor is configured to execute an application stored in a memory that is communicatively coupled to the processor, the application, when executed, causing the processor to: command the digital camera to capture a first image of identification information marked on an article, perform optical character recognition of the first image to identify characters of the identification information and create therefrom a second image, display the first image and the second image via the graphical user interface on the monitor so that the first image can be compared to the second image, and create a machine-readable code representing the identification information.

DOCUMENT ROTATION DETECTION AND CORRECTION
20250182511 · 2025-06-05 ·

Certain aspects of the disclosure provide a method for generating training data and training a machine learning model. The method may include rotating each document image in a first set of document images by a plurality of rotation angles to obtain a first set of rotated document images and associating a rotation classification label to each rotated document image in the first set of rotated document images. The method may further include for each document image in a second set of document images: rotating the respective document image by a plurality of rotation angles, performing an optical character recognition analysis at each rotation angle of the plurality of rotation angles, generating a confidence score based on the optical character recognition analyses, assigning the confidence score to the respective document image, and associating a rotation classification to the respective document image based on the optical character recognition analyses.

Method and system to detect a text from multimedia content captured at a scene

Detection of textual phrases in a non-horizontal orientation at a scene is a critical problem. This disclosure relates to a processor implemented method to detect a text from multimedia content captured at a scene. An input original image is processed by a trained model to obtain an individual character with a bounding box on the original image. The original image is positioned by a gradient to obtain a rotated image if a number of detected characters is not equal to a number of expected characters on the original image. At least one missing character bounding box on the original image and on the rotated image are estimated to construct a horizontal text image if the number of detected characters is not equal to the number of expected characters on the rotated image. At least one missing character in the estimated bounding box is detected by at least one text returned from an optical character reader.

Method and apparatus for recognizing text, storage medium, and electronic device

Provided is a method for recognizing a text. The method includes: acquiring a text image; determining at least one text box in the text image, wherein each of the at least one text box corresponds to at least one word; determining, from the at least one text box, a text box to be recognized; determining a picture unit corresponding to the text box to be recognized in the text image; rotating the picture unit to a target posture; and determining a target recognition result by performing text recognition on the picture unit in the target posture.

PROCESSING MULTIPLE DOCUMENTS IN AN IMAGE

Disclosed are various embodiments for processing multiple documents in an image. First, text from each of one or more documents in an image can be identified. An orientation of each of the one or more documents can be determined based at least in part on an alignment of the text of each of the one or more documents. Additionally, using an object detection model, the one or more documents in the image can be identified based at least in part on the orientation of each of the one or more documents. Finally, the one or more documents can be separated from the image into one or more separate image files, each separate image file representing a respective document of the one or more documents.

Systems and methods for detecting text of interest

In some embodiments, apparatuses and methods are provided herein useful to detecting text of interest. In some embodiments, there is provided a system to detect vertically oriented text of interest including at least one camera and a control circuit configured to execute a trained machine learning model to automatically detect vertically oriented text of interest on an object of interest. The trained machine learning model is at least trained on a first data set including a plurality of captured digital images each depicting the object of interest, and a second data set including a plurality of augmented digital images each depicting a captured digital image augmented with a synthetic text image including randomly generated text on a randomly selected background image.