G06V30/162

IMAGE PROCESSING APPARATUS THAT IDENTIFIES CHARACTER PIXEL IN TARGET IMAGE USING FIRST AND SECOND CANDIDATE CHARACTER PIXELS
20190392245 · 2019-12-26 ·

In an image processing apparatus, a controller is configured to perform: acquiring target image data representing a target image including a plurality of pixels; determining a plurality of first candidate character pixels from among the plurality of pixels, determination of the plurality of first candidate character pixels being made for each of the plurality of pixels; setting a plurality of object regions in the target image; determining a plurality of second candidate character pixels from among the plurality of pixels, determination of the plurality of second candidate character pixels being made for each of the plurality of object regions according to a first determination condition; and identifying a character pixel from among the plurality of pixels, the character pixel being included in both the plurality of first candidate character pixels and the plurality of second candidate character pixels.

HANDWRITING DETECTOR, EXTRACTOR, AND LANGUAGE CLASSIFIER
20190392207 · 2019-12-26 ·

Disclosed are methods for handwriting recognition. In some aspects, an image representing a page of a sample document is analyzed to identify a region having indications of handwriting. The region is analyzed to determine frequencies of a plurality of geometric features within the region. The frequencies may be compared to profiles or histograms of known language types, to determine if there are similarities between the frequencies in the sample document relative to those of the known language types. In some aspects, machine learning may be used to characterize the document as a particular language type based on the frequencies of the geometric features.

IMAGE DATA EXTRACTION USING NEURAL NETWORKS
20190384970 · 2019-12-19 ·

Embodiments of the present disclosure pertain to extracting data from images using neural networks. In one embodiment, an image is fit to a predetermined bounding window. The image is then processed with a convolutional neural network to produce a three dimensional data cube. Slices of the cube are processed by an encoder RNN, and the results concatenated. The concatenated results are processed by an attention layer with input from a downstream decoder RNN. The attention layer output is provided to the decoder RNN to generate a probability array where values in the probability array correspond to particular characters in a character set. The maximum value is selected, and translated into an output character. In one embodiment, an amount may be extracted from an image of a receipt.

Optimization and use of codebooks for document analysis

A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.

Optical character recognition employing deep learning with machine generated training data

An optical character recognition system employs a deep learning system that is trained to process a plurality of images within a particular domain to identify images representing text within each image and to convert the images representing text to textually encoded data. The deep learning system is trained with training data generated from a corpus of real-life text segments that are generated by a plurality of OCR modules. Each of the OCR modules produces a real-life image/text tuple, and at least some of the OCR modules produce a confidence value corresponding to each real-life image/text tuple. Each OCR module is characterized by a conversion accuracy substantially below a desired accuracy for an identified domain. Synthetically generated text segments are produced by programmatically converting text strings to a corresponding image where each text string and corresponding image form a synthetic image/text tuple.

Object detection and image cropping using a multi-detector approach
11967164 · 2024-04-23 · ·

Systems, methods and computer program products for detecting objects using a multi-detector are disclosed, according to various embodiments. In one aspect, a computer-implemented method includes defining analysis profiles, where each analysis profile: corresponds to one of a plurality of detectors, and comprises: a unique set of analysis parameters and/or a unique detection algorithm. The method further includes analyzing image data in accordance with the analysis profiles; selecting an optimum analysis result based on confidence scores associated with different analysis results; and detecting objects within the optimum analysis result. According to additional aspects, the analysis parameters may define different subregions of a digital image to be analyzed; a composite analysis result may be generated based on analysis of the different subregions by different detectors; and the optimum analysis result may be based on the composite analysis result.

Object detection and image cropping using a multi-detector approach
11967164 · 2024-04-23 · ·

Systems, methods and computer program products for detecting objects using a multi-detector are disclosed, according to various embodiments. In one aspect, a computer-implemented method includes defining analysis profiles, where each analysis profile: corresponds to one of a plurality of detectors, and comprises: a unique set of analysis parameters and/or a unique detection algorithm. The method further includes analyzing image data in accordance with the analysis profiles; selecting an optimum analysis result based on confidence scores associated with different analysis results; and detecting objects within the optimum analysis result. According to additional aspects, the analysis parameters may define different subregions of a digital image to be analyzed; a composite analysis result may be generated based on analysis of the different subregions by different detectors; and the optimum analysis result may be based on the composite analysis result.

Range and/or polarity-based thresholding for improved data extraction

Computerized techniques for improved binarization and extraction of information from digital image data are disclosed in accordance with various embodiments. The inventive concepts include: rendering, using a processor of the mobile device, a digital image using a plurality of binarization thresholds to generate a plurality of range-binarized digital images, wherein each rendering of the digital image is generated using a different combination of the plurality of binarization thresholds; identifying, using the processor of the mobile device, one or more range connected components within the plurality of range-binarized digital images; and identifying, using the processor of the mobile device, a plurality of text regions within the digital image based on some or all of the range connected components. Corresponding systems and computer program products are also disclosed.

System and method for detecting forgeries

A document forgery detection method comprising using at least one processor for providing at least one histogram of gray level values occurring in at least a portion of at least one channel of an image assumed to represent a document including text, the histogram having been generated by image processing at least a portion of at least one channel of an image assumed to represent a document including text, the image having been sent by a remote end user to an online service over a computer network, evaluating monotony of at least a portion of the at least one histogram; and determining whether the image is authentic or forged based on at least one output of the evaluating.

Image processing apparatus that identifies character pixel in target image using first and second candidate character pixels
10423854 · 2019-09-24 · ·

In an image processing apparatus, a controller is configured to perform: acquiring target image data representing a target image including a plurality of pixels; determining a plurality of first candidate character pixels from among the plurality of pixels, determination of the plurality of first candidate character pixels being made for each of the plurality of pixels; setting a plurality of object regions in the target image; determining a plurality of second candidate character pixels from among the plurality of pixels, determination of the plurality of second candidate character pixels being made for each of the plurality of object regions according to a first determination condition; and identifying a character pixel from among the plurality of pixels, the character pixel being included in both the plurality of first candidate character pixels and the plurality of second candidate character pixels.