G06V2201/01

Training Text Recognition Systems
20210241032 · 2021-08-05 · ·

In implementations of recognizing text in images, text recognition systems are trained using noisy images that have nuisance factors applied, and corresponding clean images (e.g., without nuisance factors). Clean images serve as supervision at both feature and pixel levels, so that text recognition systems are trained to be feature invariant (e.g., by requiring features extracted from a noisy image to match features extracted from a clean image), and feature complete (e.g., by requiring that features extracted from a noisy image be sufficient to generate a clean image). Accordingly, text recognition systems generalize to text not included in training images, and are robust to nuisance factors. Furthermore, since clean images are provided as supervision at feature and pixel levels, training requires fewer training images than text recognition systems that are not trained with a supervisory clean image, thus saving time and resources.

Background noise reduction using a variable range of color values dependent upon the initial background color distribution

A method to reduce background noise in a document image. The method includes extracting, from the document image, a connected component corresponding to a background of the document image, generating a histogram of pixel values of the connected component, generating, using a non-linear mapping function based on the histogram, a non-linear probability distribution of the pixel values in the connected component, generating, based at least on a comparison between the non-linear probability distribution and a predetermined threshold, a replacement range of the pixel values, selecting, from the connected component, a pixel having a pixel value within the replacement range, and converting the pixel value of the pixel to a uniform background color.

Training text recognition systems
10997463 · 2021-05-04 · ·

In implementations of recognizing text in images, text recognition systems are trained using noisy images that have nuisance factors applied, and corresponding clean images (e.g., without nuisance factors). Clean images serve as supervision at both feature and pixel levels, so that text recognition systems are trained to be feature invariant (e.g., by requiring features extracted from a noisy image to match features extracted from a clean image), and feature complete (e.g., by requiring that features extracted from a noisy image be sufficient to generate a clean image). Accordingly, text recognition systems generalize to text not included in training images, and are robust to nuisance factors. Furthermore, since clean images are provided as supervision at feature and pixel levels, training requires fewer training images than text recognition systems that are not trained with a supervisory clean image, thus saving time and resources.

Automatic correlation of items and adaptation of item attributes using object recognition

The technology includes an example method for parsing data to determine corollary objects and adapt attributes of the corollary objects using the data. In some implementations, the method may include receiving one or more images, performing text recognition to determine recognized text in the one or more images, and determining data cells containing information associated with a first item. The method may then classify one or more of the determined data cells, and may identify key cells in the determined data cells based on the classification of the one or more determined data cells. Correlations between information contained in the key cells and a second item, the second item including an interchangeable item to the first item may be determined, and the method may adjust attributes associated with the second item based on defined parameters and the recognized text.

SYSTEMS AND METHODS FOR DISAMBIGUATING A VOICE SEARCH QUERY BASED ON GESTURES

Systems and methods are described herein for disambiguating a voice search query by determining whether the user made a gesture while speaking a quotation from a content item and whether the user mimicked or approximated a gesture made by a character in the content item when the character spoke the words quoted by the user. If so, a search result comprising an identifier of the content item is generated. A search result representing the content item from which the quotation comes may be ranked highest among other search results returned and therefore presented first in a list of search results. If the user did not mimic or approximate a gesture made by a character in the content item when the quotation is spoken in the content item, then a search result may not be generated for the content item or may be ranked lowest among other search results.

CELL CULTURE DEVICE

Cell culture device includes incubator unit that cultures cells in culture container, transport unit that transports culture container to incubator unit, task setting unit that sets a task relating to a culture of the cells, imaging unit serving as an example of a management information acquisition unit that acquires management information attached to culture container, and storage unit that stores the management information acquired by imaging unit. From storage unit, task setting unit reads the management information of plurality of culture containers serving as targets for carrying out the task, and causes display serving as an example of a display unit to display the management information.

Optical character recognition improvement based on inline location determination

Techniques for optical character recognition improvement based on inline location determination is provided. The techniques include receiving a digital data stream containing a digital image. As the digital data stream arrives, a determination is made whether a number of received bytes associated with a header portion of the digital image has reached a target number. In response to determining that the number of received bytes associated with the header portion of the digital image has reached the target number, the bytes associated with the header portion of the digital image are cloned. While the digital data stream is received, location data from the cloned bytes associated with the header portion are determined. After the digital image has been received, text in the digital image is caused to be recognized by an optical character recognition system based, at least in part, on the location data.

Method and apparatus for verifying certificates and identities

This specification describes techniques for verifying authenticity of an image. One example method includes identifying a baseline image depicting a baseline background; identifying a comparison image depicting a card; identifying a comparison background in the comparison image, wherein the comparison background is an area of the comparison image other than an area occupied by the card; determining a probability that the baseline background matches the comparison background; determining that the probability satisfies a verification threshold; and in response to determining that the probability satisfies a verification threshold, determining that the comparison image was acquired by capturing an image of a physical card corresponding to the card depicted in the comparison image.

Apparatuses, methods, and systems for 3-channel dynamic contextual script recognition using neural network image analytics and 4-tuple machine learning with enhanced templates and context data

In some embodiments, a method includes training a first machine learning model based on multiple documents and multiple templates associated with the multiple documents. The method further includes executing the first machine learning model to generate multiple relevancy masks, the multiple relevancy masks to remove a visual structure of the multiple templates from a visual structure of the multiple documents. The method further includes generating multiple multichannel field images to include the multiple relevancy masks and at least one of the multiple documents or the multiple templates. The method further includes training a second machine learning model based on the multiple multichannel field images and multiple non-native texts associated with the multiple documents. The method further includes executing the second machine learning model to generate multiple non-native texts from the multiple multichannel field images.

Training Text Recognition Systems
20200151503 · 2020-05-14 · ·

In implementations of recognizing text in images, text recognition systems are trained using noisy images that have nuisance factors applied, and corresponding clean images (e.g., without nuisance factors). Clean images serve as supervision at both feature and pixel levels, so that text recognition systems are trained to be feature invariant (e.g., by requiring features extracted from a noisy image to match features extracted from a clean image), and feature complete (e.g., by requiring that features extracted from a noisy image be sufficient to generate a clean image). Accordingly, text recognition systems generalize to text not included in training images, and are robust to nuisance factors. Furthermore, since clean images are provided as supervision at feature and pixel levels, training requires fewer training images than text recognition systems that are not trained with a supervisory clean image, thus saving time and resources.