Patent classifications
G06V2201/01
REGION OF INTEREST EXTRACTION FROM REFERENCE IMAGE USING OBJECT MAP
For each of a number of regions of interest (ROI) types, ROIs are extracted from a reference image based on an object map distinguishing symbol, raster, and vector objects within the reference image. Whether print quality of a printing device has degraded below a specified acceptable print quality level is assessed based on a comparison of the extracted ROIs within the reference image to corresponding ROIs within a test image corresponding to the reference image and printed by the printing device.
Apparatuses, methods, and systems for 3-channel dynamic contextual script recognition using neural network image analytics and 4-tuple machine learning with enhanced templates and context data
In some embodiments, a method includes training a first machine learning model based on multiple documents and multiple templates associated with the multiple documents. The method further includes executing the first machine learning model to generate multiple relevancy masks, the multiple relevancy masks to remove a visual structure of the multiple templates from a visual structure of the multiple documents. The method further includes generating multiple multichannel field images to include the multiple relevancy masks and at least one of the multiple documents or the multiple templates. The method further includes training a second machine learning model based on the multiple multichannel field images and multiple non-native texts associated with the multiple documents. The method further includes executing the second machine learning model to generate multiple non-native texts from the multiple multichannel field images.
Image processing apparatus
Processing a dithered image comprising a grid of pixels including defining an array of pixels corresponding to a sub-region of the image; performing edge detection along the rows and the columns of the array; counting the number of edges detected along the rows of the array to determine the number of horizontal edges in the array; counting the number of edges detected along the columns of the array to determine the number of vertical edges in the array; identifying whether the sub-region is dithered based on the number of horizontal and vertical edges in the array; and selectively processing the corresponding sub-region of the image based on whether or not the sub-region is identified to be dithered. The identification step may also be based on the lengths of segments of similar pixels in the lines of the array.
System and method for automatic detection and verification of optical character recognition data
Systems and methods for automatically verifying optical character recognition (OCR) detected text of a native electronic document having an image layer comprising a matrix of pixels and a text layer comprising a sequence of characters. The method includes determining a location of OCR-detected text in the text layer of the native electronic document based on a pixel-based coordinate location of the OCR-detected text in the image layer of the native electronic document. The method also includes applying the location of the OCR-detected text to the text layer of the native electronic document to detect text in the text layer corresponding to the OCR-detected text. The method also includes rendering only the detected text in the text layer as an output when the OCR-detected text does not match the detected text in the text layer, to improve accuracy of the output text.
Systems and methods for disambiguating a voice search query based on gestures
Systems and methods are described herein for disambiguating a voice search query by determining whether the user made a gesture while speaking a quotation from a content item and whether the user mimicked or approximated a gesture made by a character in the content item when the character spoke the words quoted by the user. If so, a search result comprising an identifier of the content item is generated. A search result representing the content item from which the quotation comes may be ranked highest among other search results returned and therefore presented first in a list of search results. If the user did not mimic or approximate a gesture made by a character in the content item when the quotation is spoken in the content item, then a search result may not be generated for the content item or may be ranked lowest among other search results.
Training text recognition systems
In implementations of recognizing text in images, text recognition systems are trained using noisy images that have nuisance factors applied, and corresponding clean images (e.g., without nuisance factors). Clean images serve as supervision at both feature and pixel levels, so that text recognition systems are trained to be feature invariant (e.g., by requiring features extracted from a noisy image to match features extracted from a clean image), and feature complete (e.g., by requiring that features extracted from a noisy image be sufficient to generate a clean image). Accordingly, text recognition systems generalize to text not included in training images, and are robust to nuisance factors. Furthermore, since clean images are provided as supervision at feature and pixel levels, training requires fewer training images than text recognition systems that are not trained with a supervisory clean image, thus saving time and resources.
SYSTEMS AND METHODS FOR DISAMBIGUATING A VOICE SEARCH QUERY BASED ON GESTURES
Systems and methods are described herein for disambiguating a voice search query by determining whether the user made a gesture while speaking a quotation from a content item and whether the user mimicked or approximated a gesture made by a character in the content item when the character spoke the words quoted by the user. If so, a search result comprising an identifier of the content item is generated. A search result representing the content item from which the quotation comes may be ranked highest among other search results returned and therefore presented first in a list of search results. If the user did not mimic or approximate a gesture made by a character in the content item when the quotation is spoken in the content item, then a search result may not be generated for the content item or may be ranked lowest among other search results.
INFORMATION PROCESSING DEVICE, DISCERNING METHOD, AND DISCERNING PROGRAM
An information processing device (10) acquires a plurality of ledger sheets having the same layout, compares the contents of each column at the same position each of the acquired plurality ledger sheets having the same layout, discriminates the type of each column according to the comparison result, and stores the information on the type of each column in a storage unit (14). Moreover, the information processing device (10) acquires position information of a processing target ledger sheet, compares information on the type of a column and the content of each column using information on a registered style with respect to the acquired ledger sheet, discriminates the type of each column of the processing target ledger sheet according to the comparison result, and specifies style candidates of the processing target ledger sheet on the basis of the discrimination result.
Method of character recognition in written document
A method for recognizing characters in an image of a document having at least one alphanumeric field. The method includes the steps of enhancing an image contrast to highlight the characters in the image; detecting contours of objects in the image to create a mask that highlights the characters; segmenting the image using a tree with connected components and applying the mask thereto in order to extract the characters from the image; performing a character recognition on the extracted objects. A device for implementing the method.
System and method for enrichment of OCR-extracted data
A computer implemented a method and system for enrichment of OCR extracted data is disclosed comprising of accepting a set of extraction criteria and a set of configuration parameters by a data extraction engine. The data extraction engine captures data satisfying an extraction criteria using the configuration parameters and adapts the captured data using a set of domain specific rules and a set of OCR error patterns. A learning engine generates learning data models using the adapted data and the configuration parameters and the system dynamically updates the extraction criteria using the generated learning data models. The extraction criteria comprise one or more extraction templates wherein an extraction template includes one of a regular expression, geometric markers, anchor text markers and a combination thereof.