IPIQ

G06V30/1918

OPTICAL CHARACTER RECOGNITIONS VIA CONSENSUS OF DATASETS

20200134348 · 2020-04-30 ·

Hewlett-Packard Development Company, L.P.

Mikhail Breslav

An example of apparatus includes a memory to store a first image of a document and a second image of the document. The first image and the second image are Memory captured under different conditions. The apparatus includes a processor coupled to the memory. The processor is to perform optical character recognition on the first image to generate a first output dataset and to perform optical character recognition on the second image to generate a second output dataset. The processor is further to determine whether consensus for a character is achieved based on a comparison of the first output dataset with the second output dataset, and generate a final output dataset based on the consensus for the character.

Information processing apparatus and non-transitory computer readable medium for determining accuracy of analyzed input data

10540558 · 2020-01-21 ·

Fuji Xerox Co., Ltd.

Shunichi Kimura

An information processing apparatus includes: determination units that make determinations on an input using different methods, and obtain determination results for the input; a first output unit that outputs, when a certain percentage or more of the determination results match, a determination result matched at the certain percentage or more; a second output unit that outputs, when the first output unit does not find a determination result matched at the certain percentage or more, a final determination result for the input; and an accuracy rate calculation unit that calculates, when a determination result obtained by a determination unit of interest among the determination units corresponds to a determination result matched at the certain percentage or more or matches the determination result output by the second output unit, an accuracy rate of the determination unit of interest, regarding that the determination result obtained by the determination unit of interest is correct.

MULTI-MODAL ELECTRONIC DOCUMENT CLASSIFICATION

20200019769 · 2020-01-16 ·

A method comprising operating at least one hardware processor for: receiving, as input, a plurality of electronic documents, training a machine learning classifier based, at least on part, on a training set comprising: (i) labels associated with the electronic documents, (ii) raw text from each of said plurality of electronic documents, and (iii) a rasterized version of each of said plurality of electronic documents, and applying said machine learning classifier to classify one or more new electronic documents.

Apparatus for capturing and processing images

10518034 · 2019-12-31 ·

Sanofi-Aventis Deutschland GMBH

Stephan Muller-Pathle

An apparatus comprising an optical sensor having a plurality of photosensitive cells arranged in a grid, wherein the optical sensor is configured to capture a first image using a first subset of the photosensitive cells and to capture a second image using a second subset of the photosensitive cells, the second subset not including any of the photosensitive cells in the first subset, wherein the apparatus is configured to process the first and second images using at least one optical character recognition algorithm.

SYSTEMS AND METHODS FOR USING IMAGE ANALYSIS TO AUTOMATICALLY DETERMINE VEHICLE INFORMATION

20240087347 · 2024-03-14 ·

The present disclosure is directed to systems and methods for analyzing digital images to determine alphanumeric strings depicted in the digital images. An electronic device may generate a set of filtered images using a received digital image. The electronic device may also perform an optical character recognition (OCR) technique on the set of filtered images, and may filter out any of the set of filtered images according to a set of rules. The electronic device may further identify a set of common elements representative of the alphanumeric string depicted in the digital image, and determine a machine-encoded alphanumeric string based on the set of common elements

Multi-sensor calibration system

11908163 · 2024-02-20 ·

TUSIMPLE, INC.

Techniques for performing multi-sensor calibration on a vehicle are described. A method includes obtaining, from each of at least two sensors located on a vehicle, sensor data item of a road comprising a lane marker, extracting, from each sensor data item, a location information of the lane marker, and calculating extrinsic parameters of the at least two sensors based on determining a difference between the location information of the lane marker from each sensor data item and a previously stored location information of the lane marker.

Document Extraction Template Induction

20240046686 · 2024-02-08 ·

Google Llc

A method for document extraction includes receiving, from a user device associated with a user, an annotated document that includes one or more fields. Each respective field of the one or more fields of the annotated document is labeled by a respective annotation. The method includes clustering, using a template matching algorithm, the annotated document into a cluster and inducing, using the annotated document, a document template for the cluster. The method includes receiving, from the user device, an unannotated document including the one or more fields. The method includes clustering, using the template matching algorithm, the unannotated document into the cluster and, in response to clustering the unannotated document into the cluster, extracting, using the document template, the one or more fields.

Channel Fusion for Vision-Language Representation Learning

20240119713 · 2024-04-11 ·

Provided is an approach that aligns multi-modal tokens using cross-attention without losing the advantages of global self-attention. In contrast to previous works that concatenate the unimodal tokens along the sequence dimension, example approaches described herein align per-modality tokens by chaining them along the channels. Specifically, the tokens from one modality can be used to query the other modality and the output can be concatenated with the query tokens on the channels. An analogous process can also be repeated (or performed in parallel) where the roles of the two modalities are switched. The resulting sets of compound tokens can be concatenated and fed into a self-attention encoder such as a transformer encoder that performs self-attention.

INFORMATION PROCESSING APPARATUS AND NON-TRANSITORY COMPUTER READABLE MEDIUM

20190294904 · 2019-09-26 ·

Fuji Xerox Co., Ltd.

Shunichi Kimura

SYSTEM AND METHOD FOR DATA EXTRACTION AND SEARCHING

20190286898 · 2019-09-19 ·

Some implementations of the disclosure are directed to: extracting metadata from textual data representations of a plurality of document images, and contextualizing the extracted metadata; storing the extracted metadata and the textual data representations in a full text index database; and transferring the extracted metadata and the textual data representations from the full text index database to a search engine platform, the search engine platform indexing and storing the transferred extracted metadata to allow for searching of the indexed, extracted metadata, the indexed, extracted metadata having been correlated to the textual data representations, where the search engine platform allows for the selection of extracted metadata stored in full text index database that is transferred to the search engine platform.

Patent classifications

G06V30/1918