G06V30/19147

Embedding human labeler influences in machine learning interfaces in computing environments
11526713 · 2022-12-13 · ·

A mechanism is described for facilitating embedding of human labeler influences in machine learning interfaces in computing environments, according to one embodiment. A method of embodiments, as described herein, includes detecting sensor data via one or more sensors of a computing device, and accessing human labeler data at one or more databases coupled to the computing device. The method may further include evaluating relevance between the sensor data and the human labeler data, where the relevance identifies meaning of the sensor data based on human behavior corresponding to the human labeler data, and associating, based on the relevance, human labeler data with the sensor data to classify the sensor data as labeled data. The method may further include training, based on the labeled data, a machine learning model to extract human influences from the labeled data, and embed one or more of the human influences in one or more environments representing one or more physical scenarios involving one or more humans.

Systems and methods for domain agnostic document extraction with zero-shot task transfer

A system for performing document extraction is configured to: (a) receive a first document; (b) extract the first document into document elements, the document elements including pages, lines, paragraphs, or any combination thereof; (c) determine a first set of fields of interest for the first document, wherein the first set of fields of interest are determined via a type of the first document or via a first set of queries for probing the first document; (d) determine, from a plurality of closed domain question answering (CDQA) models, a first set of CDQA models that provides answers to each field of interest included in the first set of fields of interest; and (e) provide answers to the first set of fields of interest to the client device.

METHOD FOR TRAINING TEXT POSITIONING MODEL AND METHOD FOR TEXT POSITIONING
20220392242 · 2022-12-08 ·

A method for training a text positioning model includes: obtaining a sample image, where the sample image contains a sample text to be positioned and a text marking box for the sample text; inputting the sample image into a text positioning model to be trained to position the sample text, and outputting a prediction text box for the sample image; obtaining a sample prior anchor box corresponding to the sample image; and adjusting model parameters of the text positioning model based on the sample prior anchor box, the text marking box and the prediction text box, and continuing training the adjusted text positioning model based on a next sample image until model training is completed, to generate a target text positioning model.

CHARACTER ENCODING AND DECODING FOR OPTICAL CHARACTER RECOGNITION
20220391637 · 2022-12-08 ·

The present disclosure provides techniques for encoding and decoding characters for optical character recognition. The techniques involve determining sets of numbers for encoding a character set where each number in a particular set of numbers for encoding a particular character is mapped to a graphical unit (e.g., radical) of the particular character. A mapping between each set of numbers in the possible encodings and the character set may be determined based the closest character already encoded. A machine learning model may be trained to perform optical character recognition using training data labeled using the set of encodings and the mappings.

Image processing system, image processing apparatus, image processing method, and storage medium
11521365 · 2022-12-06 · ·

An image processing system acquires a scanned image obtained by scanning an original, and extracts a character region that includes characters from within the scanned image. The image processing system performs conversion processing, for converting a font of a character included in the extracted character region from a first font to a second font, on the scanned image using a conversion model for which training has been performed in advance so as to convert characters of the first font in an inputted image into characters of the second font and output a converted image. Then, the image processing system executes OCR on the scanned image after the conversion processing.

Using few shot learning on recognition system for character image in industrial processes

An artificial intelligence optical character image recognition system and method, using few shot learning on recognition system for character image in industrial processes, mainly including: preparing two or more identical neural network architecture units, inputting similar or different character images respectively, and comparing the calculation results to see if the weights are similar. If the similarity reaches the set standard value, they are classified as the same type of character, otherwise different. Through such procedures, training samples in the storage unit are gradually divided into settings of character sets with different contextual meanings, becoming a complete AI OCR system. It can increase training sample data by comparing characters, without increasing the training set. Simultaneously, it can improve the flexibility of recognizing test characters.

Generating training sets to train machine learning models

A computer system trains a machine learning model. A vector representation is generated for each document in a collection of documents. The documents are clustered based on the vector representations of the documents to produce a plurality of clusters. A training set is produced by selecting one or more documents from each cluster, wherein the selected documents represent a sample of the collection of documents to train the machine learning model. The machine learning model is trained by applying the training set to the machine learning model. Embodiments of the present invention further include a method and program product for training a machine learning model in substantially the same manner described above.

Teaching GAN (generative adversarial networks) to generate per-pixel annotation

A method and apparatus for joint image and per-pixel annotation synthesis with a generative adversarial network (GAN) are provided. The method includes: by inputting data to a generative adversarial network (GAN), obtaining a first image from the GAN; inputting, to a decoder, a first feature value that is obtained from at least one intermediate layer of the GAN according to the inputting of the data to the GAN; and obtaining a first semantic segmentation mask from the decoder according to the inputting of the first feature value to the decoder.

Learning user interface controls via incremental data synthesis

A User Interface (UI) interface object detection system employs an initial dataset comprising a set of images, that may include synthesized images, to train a Machine Learning (ML) engine to generate an initial trained model. A data point generator is employed to generate an updated synthesized image set which is used to further train the ML engine. The data point generator may employ images generated by an application program as a reference by which to generate the updated synthesized image set. The images generated by the application program may be tagged in advance. Alternatively, or in addition, the images generated by the application program may be captured dynamically by a user using the application program.

APPARATUS AND MANUAL PROVIDING APPARATUS

An apparatus includes circuitry; and a memory storing computer-executable instructions that cause the circuitry to execute acquiring training data, in which image data in a manual and text data in the manual are input data, and in which work procedure information is output data, the work procedure information being supplemented by adding, to the text data in the manual, text data that is generated based on the image data; performing machine learning by using the training data; and generating a machine learning model that outputs the work procedure information in response to receiving input of the image data in the manual and the text data in the manual.