G06V30/153

METHOD AND APPARATUS FOR RETRIEVING TARGET

A method and an apparatus for retrieving a target are provided. The method may include: obtaining at least one image and a description text of a designated object; extracting image features of the image and text features of the description text by using a pre-trained cross-media feature extraction network; and matching the image features with the text features to determine an image that contains the designated object.

Character recognizing apparatus and non-transitory computer readable medium
11568659 · 2023-01-31 · ·

A character recognizing apparatus includes an acquiring unit, an identifying unit, and a character recognizing unit. The acquiring unit acquires a string image that is an image of a string generated in accordance with one of multiple string generation schemes. The identifying unit identifies a range specified for a result of character recognition in each of the multiple string generation schemes. The character recognizing unit performs first character recognition on the string image, and if a result of the first character recognition has a feature of a particular string generation scheme of the multiple string generation schemes, the character recognizing unit performs second character recognition on the string image within the range specified for a result of character recognition in the particular string generation scheme.

Method, system, and non-transitory computer readable record medium for extracting and providing text color and background color in image
11568631 · 2023-01-31 · ·

A method for extracting and providing a text color and background color in an image, includes detecting a first area that includes a text in a given image; extracting, from the first area, a representative text color that represents the text and a representative background color that represents a background of the first area; and overlaying a second area that includes a translation result of the text on the given image and applying the representative text color and the representative background color to a text color and a background color of the second area.

Systems and methods for separating ligature characters in digitized document images
11710331 · 2023-07-25 · ·

Embodiments disclosed herein provide for systems and methods of separating characters associated with ligatures in digitized documents. The systems and methods provide for a ligature detection engine configured to identify the ligatures, and a ligature processing engine configured to identify and remove the glyphs attaching the separate characters forming the ligature.

Generating event logs from video streams

A process mining system performs process mining using visual logs generated from video streams of worker devices. Specifically, for a given worker device, the process mining system obtains a series of images capturing a screen of a worker device while the worker device processes one or more tasks related to an operation process. The process mining system determines activity labels for a plurality of images. An activity label for an image may indicate an activity performed on the worker device when the image was captured. The activity label is determined by extracting information from pixels of the image and inferring the activity of the worker device from the extracted information. The process mining system generates event logs from the visual logs of worker devices and uses the event logs for process mining.

IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND NON-TRANSITORY RECORDING MEDIUM
20230027065 · 2023-01-26 · ·

An image processing apparatus includes circuitry to set first upper limit values for vertical and horizontal sizes of a character included in image data for erecting direction determination, segment the image data in units of character into a plurality of rectangular areas, determine, in the image data, a plurality of first rectangular areas each of which satisfies the first upper limit values, perform character recognition on characters in the plurality of first rectangular areas in four directions of a +X direction, a −X direction, a +Y direction, and a −Y direction, calculate degrees of certainty of the four directions, determine whether a direction having a highest degree of certainty among the calculated degrees of certainty of the four directions is an erecting direction of the image data to output a determination result, and perform, along the erecting direction, character recognition on characters in a plurality of second rectangular areas of the image data, the plurality of second rectangular areas satisfying second upper limit values for the vertical and horizontal sizes smaller than the first upper limit values for erecting direction determination.

DESIGN OPTIMIZATION AND USE OF CODEBOOKS FOR DOCUMENT ANALYSIS
20230028992 · 2023-01-26 ·

A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.

SYSTEMS AND METHODS OF IMAGE SEARCHING
20230229690 · 2023-07-20 ·

Systems and methods of image searching include receiving content, receiving a request to select an image from content, selecting a plurality of items in the image, retrieving information about the selected item, and providing display data based on the retrieved information.

Phrase recognition model for autonomous vehicles

Aspects of the disclosure relate to training and using a phrase recognition model to identify phrases in images. As an example, a selected phrase list may include a plurality of phrases is received. Each phrase of the plurality of phrases includes text. An initial plurality of images may be received. A training image set may be selected from the initial plurality of images by identifying the phrase-containing images that include one or more phrases from the selected phrase list. Each given phrase-containing image of the training image set may be labeled with information identifying the one or more phrases from the selected phrase list included in the given phrase-containing images. The model may be trained based on the training image set such that the model is configured to, in response to receiving an input image, output data indicating whether a phrase of the plurality of phrases is included in the input image.

Systems and methods for digitized document image data spillage recovery
11704925 · 2023-07-18 · ·

Systems and methods for digitized document image data spillage recovery are provided. One or more memories may be coupled to one or more processors, the one or more memories including instructions operable to be executed by the one or more processors. The one or more processors may be configured to capture an image; process the image through at least a first pass to generate a first contour; remove a preprinted bounding region of the first contour to retain text; generate one or more pixel blobs by applying one or more filters to smudge the text; identify the one or more pixel blobs that straddle one or more boundaries of the first contour; resize the first contour to enclose spillage of the one or more pixel blobs; overlay the text from the image within the resized contour; and apply pixel masking to the resized contour.