G06V30/19147

System and method for multi-modal image classification

Systems and methods for classifying images (e.g., ads) are described. An image is accessed. Optical character recognition is performed on at least a first portion of the image. Image recognition is performed via a convolutional neural network on at least a second portion of the image. At least one class for the image is automatically identified, via a fully connected neural network, based on one or more predictions, each of the one or more predictions being based on both the optical character recognition and the image recognition. Finally, the at least one class identified for the image is output.

Domain alignment for object detection domain adaptation tasks

A domain alignment technique for cross-domain object detection tasks is introduced. During a preliminary pretraining phase, an object detection model is pretrained to detect objects in images associated with a source domain using a source dataset of images associated with the source domain. After completing the pretraining phase, a domain adaptation phase is performed using the source dataset and a target dataset to adapt the pretrained object detection model to detect objects in images associated with the target domain. The domain adaptation phase may involve the use of various domain alignment modules that, for example, perform multi-scale pixel/path alignment based on input feature maps or perform instance-level alignment based on input region proposals.

TRAINING METHOD OF TEXT RECOGNITION MODEL, TEXT RECOGNITION METHOD, AND APPARATUS

The present disclosure provides a training method of a text recognition model, a text recognition method, and an apparatus, relating to the technical field of artificial intelligence, and specifically, to the technical field of deep learning and computer vision, which can be applied in scenarios such as optional character recognition, etc. The specific implementation solution is: performing mask prediction on visual features of an acquired sample image, to obtain a predicted visual feature; performing mask prediction on semantic features of acquired sample text, to obtain a predicted semantic feature, where the sample image includes text; determining a first loss value of the text of the sample image according to the predicted visual feature; determining a second loss value of the sample text according to the predicted semantic feature; training, according to the first loss value and the second loss value, to obtain the text recognition model.

DATA CLASSIFICATION BASED ON RECURSIVE CLUSTERING
20220414369 · 2022-12-29 ·

Methods and systems are presented for providing a machine learning model framework configured to perform complex data classifications. Upon receiving a request for classifying data, the data is recursively assigned to one or more clusters. During each iteration of clustering assignment, a set of clusters is selected based on a previously assigned cluster for the data, and the data is then assigned to a particular cluster from the selected set of clusters. The machine learning model framework also includes a plurality of machine learning models configured to perform simple data classifications. A particular machine learning model is selected from the plurality of machine learning model based on the one or more clusters to which the document is assigned. The particular machine learning model is then used to classify the document.

OPTICAL CHARACTER RECOGNITION TRAINING WITH SEMANTIC CONSTRAINTS

A method, computer system, and a computer program product for optical character recognition training are provided. A text image and plain text labels for the text image may be received. The text image may include words. The plain text labels may include machine-encoded text corresponding to the words. Semantic feature vectors for the words, respectively, may be generated based on the plain text label. The text image, the plain text labels, and the semantic feature vectors may be input together into a machine learning model to train the machine learning model for optical character recognition. The plain text labels and the semantic feature vectors may be constraints for the training.

IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
20220406081 · 2022-12-22 ·

A training image in accordance with a way a hane occurs, which is found in actual handwriting, is generated. Among line segments constituting a handwritten character in a character image representing the handwritten character, a line segment at which a handwritten hane may occur is detected. Then, by performing processing to add a simulated hane to the end portion of the detected line segment, a training image is generated.

IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
20220406082 · 2022-12-22 ·

In a scene where a pseudo character image is generated by performing deformation processing for a character image, a character image that impedes training is suppressed from being generated. Based on a condition relating to a parameter that is used for the deformation processing and associated with a first class, a parameter of the deformation processing is determined and the deformation processing is performed for a character image belonging to the first class using the determined parameter. Then, whether or not the deformed character image generated by the deformation processing is similar to a character image belonging to a class different from the first class is determined and in a case where similarity is determined, the condition associated with the first class is updated.

IMAGE PROCESSING APPARATUS, CONTROL METHOD THEREOF, AND STORAGE MEDIUM
20220406083 · 2022-12-22 ·

A training image that simulates a character block entered by part of a character in another row is generated efficiently. A noise is added in the vicinity of an end portion of a character image so that a noise that seems to be caused by entering of part of a character in another row is reproduced for the character image representing a handwritten character.

Method for determining a confidence value of an object of a class
11531832 · 2022-12-20 · ·

A method is described for determining a confidence value for an object of a class determined by a neural network in an input image. The method includes: preparing an activation signature with the aid of a multiplicity of output images of a layer of the neural network for the class of the object, with the input image being provided to the input of the neural network; scaling the activation signature to the size of the input image; comparing an overlapping area portion of an area of the activation signature with an area of an object frame in relation to the area of the activation signature in order to determine the confidence value.

Transformation of hand-drawn sketches to digital images
11532173 · 2022-12-20 · ·

Techniques are disclosed for generating a vector image from a raster image, where the raster image is, for instance, a photographed or scanned version of a hand-drawn sketch. While drawing a sketch, an artist may perform multiple strokes to draw a line, and the resultant raster image may have adjacent or partially overlapping salient and non-salient lines, where the salient lines are representative of the artist's intent, and the non-salient (or auxiliary) lines are formed due to the redundant strokes or otherwise as artefacts of the creation process. The raster image may also include other auxiliary features, such as blemishes, non-white background (e.g., reflecting the canvas on which the hand-sketch was made), and/or uneven lighting. In an example, the vector image is generated to include the salient lines, but not the non-salient lines or other auxiliary features. Thus, the generated vector image is a cleaner version of the raster image.