Patent classifications
G06V30/164
Method for adaptive contrast enhancement in document images
Systems and methods here may include utilizing a computer with a processor and a memory for receiving a pixelated image of an original size, converting the pixelated image to grayscale, calculating a magnitude of spatial gradients in the received pixelated grayscale image, downscaling the received pixelated grayscale image, computing a multiplicative gain correction for the downscaled received pixelated grayscale image, re-enlarging a gain multiplication for the original image, and applying the gain multiplication to the image to generate a processed image with higher contrast than the received pixelated image.
System and method for processing and identifying content in form documents
The present disclosure generally provides a system and method for processing and identifying data in form. The system and method may distinguish between content data and background data in a form. In some aspects, the content data or background data may be removed, wherein the remaining data may be processed separately. Removal of the background data or the content data may allow for more effective or efficient character recognition of the data. In some embodiments, data may be processed on an element basis, wherein each element of the form may be labeled as background data, content data, noise, or combinations thereof. This system and method may significantly increase the ability to capture and extract relevant information from a form.
Methods, apparatuses, and computer-readable storage media for image-based sensitive-text detection
The present disclosure describes a method, an apparatus, and a non-transitory computer-readable medium for detecting sensitive text information such as privacy-related text information from a signal and modifying the signal by removing the detected sensitive text information therefrom. The apparatus receives the signal such as an image, a video clip, or an audio clip, and recognizes a text string therefrom. The apparatus then detects, from the text string, a substring based on a similarity between the substring and a regular expression, and modifies the signal by removing information related to the detected substring from the signal.
Methods, apparatuses, and computer-readable storage media for image-based sensitive-text detection
The present disclosure describes a method, an apparatus, and a non-transitory computer-readable medium for detecting sensitive text information such as privacy-related text information from a signal and modifying the signal by removing the detected sensitive text information therefrom. The apparatus receives the signal such as an image, a video clip, or an audio clip, and recognizes a text string therefrom. The apparatus then detects, from the text string, a substring based on a similarity between the substring and a regular expression, and modifies the signal by removing information related to the detected substring from the signal.
APPARATUS, STORAGE MEDIUM, AND CONTROL METHOD
High-accuracy character recognition has not been realized for a document having a space between lines is narrow, a document in which line contact occurs at a plurality of positions, and a document in which a ratio of lines with line contact is high. Noises are removed from divided line images that are obtained by dividing a text image into line units, and the removed noises are added to a neighboring divided text line image, thus restoring the character image which has been divided into the plurality of lines. This realizes the high-accuracy character recognition.
Pixel binarization apparatus, method, and storage medium
There is provided an image processing apparatus in which a part used in predetermined processing is specified with use of a relatively small binarization threshold, from parts that have been converted into black pixels through binarization processing using a relatively large binarization threshold.
OPTICAL CHARACTER RECOGNITION SUPPORT SYSTEM
A computer-implemented method for increasing a recognition rate of an optical character recognition (OCR) system is provided. The method includes preprocessing by receiving an image, and extracting all vertical lines from the image. The method includes adding vertical lines at character areas of the image, extracting all horizontal lines from the image, and creating an unlined image removing all the vertical/horizontal lines from the image. The method further includes determining a border of a vertical direction of the unlined image based on the total of pixels of rows in each column, and adding vertical/horizontal auxiliary lines between characters of the unlined image. The method also includes postprocessing by receiving garbled words of OCR output, removing noise after morphologically analyzing, replacing garbled letters with correct ones based on a frequent edit operation, and outputting the correct word, weighting results of image distance calculations based on machine learning.
Systems and methods for improving the quality of text documents using artificial intelligence
In some embodiments, an apparatus includes a memory and a processor operatively coupled to the memory. The processor is configured to receive an electronic document having a set of pages, and partition a page from the set of pages of the electronic document into a set of portions. The processor is configured to convert each portion of the set of portions into a negative image of a set of negative images. The processor is configured to produce, based on an artificial intelligence algorithm, a de-noised negative image of each negative image and convert each de-noised negative image of a set of de-noised negative images into a positive image of a set of positive images, and combine each positive image of the set of positive images to produce a de-noised page. The de-noised page has artifacts less than artifacts of the page of the electronic document.
System and Method for Processing and Identifying Content in Form Documents
The present disclosure generally provides a system and method for processing and identifying data in form. The system and method may distinguish between content data and background data in a form. In some aspects, the content data or background data may be removed, wherein the remaining data may be processed separately. Removal of the background data or the content data may allow for more effective or efficient character recognition of the data. In some embodiments, data may be processed on an element basis, wherein each element of the form may be labeled as background data, content data, noise, or combinations thereof. This system and method may significantly increase the ability to capture and extract relevant information from a form.
Method and device for extracting information in histogram
The present application relates to a method and device for extracting information from a histogram for display on an electronic device. The method comprises the following steps: inputting, into the electronic device, a document, which includes a histogram to be processed; detecting each element in the histogram to be processed by using a target detection method based on a Faster R-CNN model pre-stored in the electronic device; performing text recognition on each detected text element box by the electronic device; to extract corresponding text information; and converting all the detected elements and text information into structured data for display on the electronic device. The method and the device can detect all the elements in the histogram through deep learning and the use of the Faster R-CNN model for target detection, thus providing a simple and effective solution for information extraction in the histogram.