G06V30/18086

DATA PROCESSING METHOD AND APPARATUS

A data processing method is applied to processing of an image including a text, relates to the field of artificial intelligence, and includes: obtaining a first feature representation and a second feature representation, where the second feature representation is a text feature of a first text, and the first text is text content included in an image; and obtaining a third feature representation based on the first feature representation and the second feature representation by using a target encoder, where the third feature representation is used to execute a downstream task, and a similarity between an execution result and a corresponding label and a similarity between the first feature representation and the second feature representation are used to update an image encoder.

OBJECT IDENTIFICATION IN BIRD'S-EYE VIEW REFERENCE FRAME WITH EXPLICIT DEPTH ESTIMATION CO-TRAINING

The described aspects and implementations enable efficient detection and classification of objects with machine learning models that deploy a bird's-eye view representation and are trained using depth ground truth data. In one implementation, disclosed are system and techniques that include obtaining images, generating, using a first neural network (NN), feature vectors (FVs) and depth distributions pixels of images, wherein the first NN is trained using training images and a depth ground truth data for the training images. The techniques further include obtaining a feature tensor (FT) in view of the FVs and the depth distributions, and processing the obtained FTs, using a second NN, to identify one or more objects depicted in the images.

Method and apparatus to locate field labels on forms

Method and apparatus to identify field labels from filled forms using image processing to compare two or a few copies of the same kind of form, possibly filled out differently. Filled forms are processed to provide text strings, one for each line of text on each filled form. The text strings are converted to vectors. Vectors from different filled forms are compared to identify common words which are indicative of field labels. In an embodiment, a histogram may be generated to show frequency of occurrence of characters and words, the histogram values also being indicative of field labels.

METHOD AND APPARATUS EMPLOYING FONT SIZE DETERMINATION FOR RESOLUTION-INDEPENDENT RENDERED TEXT FOR ELECTRONIC DOCUMENTS
20250225803 · 2025-07-10 ·

Method and apparatus for determining font point size in bitmapped text does not rely on accuracy of an optical character recognition (OCR) engine, or on generation of heuristics (e.g. assumption of certain amounts of different types of text, such as capital, lowercase, ascending, descending) to determine a likely font size. A deep learning model for determining text size is based on extraction of features from existing text to obtain a more general solution.

Detecting fields in document images

A method of detecting fields in document images includes: receiving a codebook comprising a set of visual words, each visual word corresponding to a center of a cluster of local descriptors; calculating, based on a set of user labeled document images, for each visual word of the codebook, a respective frequency distribution of a field position of a specified labeled field with respect to the visual word; loading a document image for extraction of target fields; calculating a statistical predicate of a possible position of a target field in the document image based on the frequency distributions; and detecting, using the trained model, fields in the document image based on the calculated statistical predicate.

Fraud detection via automated handwriting clustering

A computer-implemented method for automatically analyzing handwritten text to determine a mismatch between a purported writer and an actual writer is disclosed. The method comprises receiving two samples of digitized handwriting each allegedly created by one individual and received and entered into a digital system by another. The method further comprises performing a series of feature extractions to convert the samples into two vectors of extracted features; automatically clustering a set of vectors such that the first vector and the second vector are assigned to the same cluster among multiple clusters, based on vector similarity; and automatically determining that a same individual being associated with both the first and second samples indicates a heightened probability that the individual fraudulently created both samples. Finally, the method comprises automatically transmitting a message to flag additional samples of digitized handwriting entered into a digital system as possibly fraudulent.

Method and system for detecting and extracting price region from digital flyers and promotions

This disclosure relates generally to method and system for detecting and extracting price region from digital flyers and promotions. In retail business, extracting price information from digital flyers is crucial for complex nature of flyers having large variety of formats, color scheme, font styles, variable text information and thereof. The method of the present disclosure detects a text region comprising a price information from a set of digital flyers and promotions received as input images. Further, each text region is converted into a two-color text comprising of a set of white pixels and a set of black pixels. Further, underlying price from the price region of the two-color text is detected and price is extracted from the price region of each input image. Additionally, the price region detection function detects price region accurately and extracts price values having an irregular font size.

DETECTING FIELDS IN DOCUMENT IMAGES

A method of detecting fields in document images includes: receiving, by a processing device, a codebook comprising a set of visual words, each visual word corresponding to a center of a cluster of local descriptors, wherein each local descriptor is associated with a respective keypoint region of a first set of document images; calculating, based on a second set of document images, for each visual word of the codebook, a respective frequency distribution of a field position of a specified field with respect to the visual word; loading a document image for extraction of target fields; and detecting fields in the document image based on the calculated frequency distributions.

Image processing apparatus, method of controlling image processing apparatus, and storage medium
12488607 · 2025-12-02 · ·

An image processing apparatus capable of removing an unnecessary area from a scanned image and thereby making it easy to recognize a necessary area of the scanned image. The image processing apparatus includes a calculation unit that calculates a density value histogram based on an acquired scanned image, a setting unit that sets a necessary area density that has a predetermined value range around the most frequently appearing density value having the highest appearance frequency in the density value histogram, and sets a binarization threshold value based on the necessary area density, and a control unit that controls execution of binarization processing for correcting an area of the scanned image, in which density values are equal to or higher than the threshold value, to black, and correcting an area of the scanned image, in which density values are lower than the threshold value, to white.

Decision system, decision method, and non-transitory storage medium

A decision system includes an identifier and a decider. The identifier identifies, based on a captured image generated by an image capturing unit attached to a tool, a work target shot as a subject of the captured image as one of a plurality of work targets. The decider decides whether a first work target included in the plurality of work targets is a mistakable work target by comparing, by reference to at least one reference image, the first work target with a second work target also included in the work targets. If the second work target is similar to the first work target, the second work target makes the first work target the mistakable work target difficult to identify based on the captured image. The at least one reference image belongs to a plurality of reference images corresponding one to one to the plurality of work targets.