G06V30/187

Scalable structure learning via context-free recursive document decomposition

An approach is provided in which the approach aggregates a set of pixel values from a bitmap image into a set of row sum values and a set of column sum values. The bitmap image is a pixelated representation of a document. The approach applies a localized Fourier transform to the set of row sum values and the set of column sum values to generate frequency representations of the set of row sum values and the set of frequency sum values. The approach decomposes the bitmap image into a set of image portions based on at least one separation location identified in the set of frequency representations, and sends the set of image portions to a text recognition system.

METHOD AND SYSTEM FOR FREQUENCY CODING IMAGE DATA
20220114386 · 2022-04-14 ·

A computer-implemented method for frequency coding of image data from an imaging sensor. The method includes: supplying first image data of an individual image recorded by an imaging sensor, the first image data having depth values of the individual image coded as a whole number or as a floating-point number; receiving the first image data by an algorithm, which frequency codes the depth values of the individual image by a predefined number of periodic functions; and outputting second image data by the algorithm, the second image data having frequency coded depth values of the individual image. A computer-implemented method is described for supplying an algorithm of machine learning for the classification of objects included in image data of an individual image from an imaging sensor. A system for the frequency coding of image data from an imaging sensor, a computer program, and a computer-readable data carrier, are also described.

SCALABLE STRUCTURE LEARNING VIA CONTEXT-FREE RECURSIVE DOCUMENT DECOMPOSITION
20210081662 · 2021-03-18 ·

An approach is provided in which the approach aggregates a set of pixel values from a bitmap image into a set of row sum values and a set of column sum values. The bitmap image is a pixelated representation of a document. The approach applies a localized Fourier transform to the set of row sum values and the set of column sum values to generate frequency representations of the set of row sum values and the set of frequency sum values. The approach decomposes the bitmap image into a set of image portions based on at least one separation location identified in the set of frequency representations, and sends the set of image portions to a text recognition system.

Methods and apparatus for imaging of layers

A sensor may measure light reflecting from a multi-layered object at different times. A digital time-domain signal may encode the measurements. Peaks in the signal may be identified. Each identified peak may correspond to a layer in the object. For each identified peak, a short time window may be selected, such that the time window includes a time at which the identified peak occurs. A discrete Fourier transform of that window of the signal may be computed. A frequency frame may be computed for each frequency in a set of frequencies in the transform. Kurtosis for each frequency frame may be computed. A set of high kurtosis frequency frames may be averaged, on a pixel-by-pixel basis, to produce a frequency image. Text characters that are printed on a layer of the object may be recognized in the frequency image, even though the layer is occluded.

SYSTEMS AND METHODS FOR BLUR IDENTIFICATION AND CORRECTION

Methods and systems are described herein for identifying the location and nature of any blur within one or more images received as a user communication and generating an appropriate correction. The system utilizes a first machine learning model, which is trained to identify blurred components of inputted images and determine whether the blurred components are located in portions of the inputted images comprising textual information. The system may apply a corrective action selected by the first machine learning model, which may comprise stitching blurred images together to a sharp product image and/or some other method appropriate for rectifying images received.

CONNECTING VISION AND LANGUAGE USING FOURIER TRANSFORM
20240127616 · 2024-04-18 ·

A method for text-image integration is provided. The method may include receiving a question related to pairable data comprising text data and image data. Embeddings are generated from the text tokens and image encodings. Embeddings are generated from the text tokens and image encodings. The embeddings include text embeddings and image embeddings. A spectral conversion of the text embeddings and the image embeddings is performed to generate spectral data. The spectral data is processed to extract text-image features. The text-image features are processed to generate inferred answers to the question.

Method and apparatus for transformation of dot text in an image into stroked characters based on dot pitches
10223618 · 2019-03-05 · ·

A method and apparatus for determining orientation and dot pitch of characters in an image. A statistical neighborhood of a set of dots of an image is determined. The statistical neighborhood includes a set of points and each point is associated with a position and a statistical measure indicative of a likelihood that one or more dots that satisfy a shape and a size criteria are located at that position. A Fast Fourier Transform (FFT) is computed across the set of points of the statistical neighborhood; and based on the FFT of the set of points, a first orientation and a first distance between adjacent dots of characters along the first orientation, and a second orientation and a second distance between adjacent dots of the characters along the second orientation are determined.

Methods and Apparatus for Imaging of Layers

A sensor may measure light reflecting from a multi-layered object at different times. A digital time-domain signal may encode the measurements. Peaks in the signal may be identified. Each identified peak may correspond to a layer in the object. For each identified peak, a short time window may be selected, such that the time window includes a time at which the identified peak occurs. A discrete Fourier transform of that window of the signal may be computed. A frequency frame may be computed for each frequency in a set of frequencies in the transform. Kurtosis for each frequency frame may be computed. A set of high kurtosis frequency frames may be averaged, on a pixel-by-pixel basis, to produce a frequency image. Text characters that are printed on a layer of the object may be recognized in the frequency image, even though the layer is occluded.

Method and apparatus for locating dot text in an image
10176400 · 2019-01-08 · ·

A method and apparatus for locating dot text in an image are described. A set of dots is extracted. A determination of whether a first region of interest (ROI) including the set of dots satisfies selection criteria is performed, where the first region of interest is oriented based on results from a principal component analysis of the set of dots. Responsive to determining that the first ROI does not satisfy the selection criteria, performing the following: removing an outlier dot from the first set of dots to obtain a second set of dots; when the second ROI satisfies the selection criteria, outputting the second ROI as a location of the dot text in the image, and when the second region of interest does not satisfy the selection criteria, repeating the operations until a resulting ROI is determined to satisfy the selection criteria.