G06V30/19147

TEXT EXTRACTION METHOD, TEXT EXTRACTION MODEL TRAINING METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM

A text extraction method and a text extraction model training method are provided. The present disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision. An implementation of the method comprises: obtaining a visual encoding feature of a to-be-detected image; extracting a plurality of sets of multimodal features from the to-be-detected image, wherein each set of multimodal features includes position information of one detection frame extracted from the to-be-detected image, a detection feature in the detection frame and first text information in the detection frame; and obtaining second text information matched with a to-be-extracted attribute based on the visual encoding feature, the to-be-extracted attribute and the plurality of sets of multimodal features, wherein the to-be-extracted attribute is an attribute of text information needing to be extracted.

Method for Recognizing Text, Apparatus and Terminal Device
20230154217 · 2023-05-18 ·

The present disclosure discloses a method for recognizing text, an apparatus and a terminal device. The method for recognizing text includes: acquiring a sample text dataset, preprocessing text image in the sample text dataset, and generating a label image; inputting the label image into a text recognition model for training, extracting image features, performing down-sampling, restoring an image resolution, normalizing an output probability for the last layer using a sigmoid layer to output a multiple prediction maps with different scales, and optimizing a loss function of the text recognition model to obtain a trained text recognition model; preprocessing a text image to be recognized, inputting the text image to be recognized which is preprocessed into the trained text recognition model, and outputting a clear-scale prediction map; and analyzing the clear-scale prediction map to obtain a text sequence of the text image to be recognized.

METHOD AND APPARATUS FOR PRE-TRAINING SEMANTIC REPRESENTATION MODEL AND ELECTRONIC DEVICE
20230147550 · 2023-05-11 ·

A method for pre-training a semantic representation model includes: for each video-text pair in pre-training data, determining a mask image sequence, a mask character sequence, and a mask image-character sequence of the video-text pair; determining a plurality of feature sequences and mask position prediction results respectively corresponding to the plurality of feature sequences by inputting the mask image sequence, the mask character sequence, and the mask image-character sequence into an initial semantic representation model; and building a loss function based on the plurality of feature sequences, the mask position prediction results respectively corresponding to the plurality of feature sequences and true mask position results, and adjusting coefficients of the semantic representation model to realize training.

QUANTUM ENHANCED WORD EMBEDDING FOR NATURAL LANGUAGE PROCESSING
20230147890 · 2023-05-11 ·

A quantum-enhanced system and method for natural language processing (NLP) for generating a word embedding on a hybrid quantum-classical computer. A training set is provided on the classical computer, wherein the training set provides at least one pair of words, and at least one binary value indicating the correlation between the pair of words. The quantum computer generates quantum state representations for each word in the pair of words. The quantum component evaluates the quantum correlation between the quantum state representations of the word pair using an engineering likelihood function and a Bayesian inference. Training the word embedding on the quantum computer is provided using an error function containing the binary value and the quantum correlation.

Continuous machine learning method and system for information extraction

Methods and systems for artificial intelligence (AI)-assisted document annotation and training of machine learning-based models for document data extraction are described. The methods and systems described herein take advantage of a continuous machine learning approach to create document processing pipelines that provide accurate and efficient data extraction from documents that include structured text, semi-structured text, unstructured text, or any combination thereof.

METHOD AND SYSTEM FOR DISTRIBUTED LEARNING AND ADAPTATION IN AUTONOMOUS DRIVING VEHICLES
20230140540 · 2023-05-04 ·

The present teaching relates to system, method, medium for in-situ perception in an autonomous driving vehicle. A plurality of types of sensor data acquired continuously by a plurality of types of sensors deployed on the vehicle are first received, where the plurality of types of sensor data provide information about surrounding of the vehicle. Based on at least one model, one or more items are tracked from a first of the plurality of types of sensor data acquired by one or more of a first type of the plurality of types of sensors, wherein the one or more items appear in the surrounding of the vehicle. At least some of the one or more items are then automatically labeled on-the-fly via either cross modality validation or cross temporal validation of the one or more items and are used to locally adapt, on-the-fly, the at least one model in the vehicle.

Object detection and image cropping using a multi-detector approach
11640721 · 2023-05-02 · ·

According to an exemplary embodiment, a method for pre-cropping digital image data includes: dividing the digital image into segments; computing a color value distance between corresponding pixels of neighboring segments of the digital image; comparing the color value distance(s) against a minimum color distance threshold; clustering neighboring segments having a color value distance less than or equal to the minimum color distance threshold; computing a connected structure based on the clustered segments; computing a polygon bounding the connected structure; comparing a fraction of segments included in the connected structure and the polygon, relative to a total number of segments in the digital image, to a minimum included segment threshold; and in response to determining the fraction of segments in the connected structure and the polygon, relative to the total number of segments meets or exceeds a minimum included segment threshold, cropping the digital image based on edges of the polygon.

OBJECT DETECTION AND IMAGE CROPPING USING A MULTI-DETECTOR APPROACH
20230206664 · 2023-06-29 ·

Computer-implemented methods for detecting objects within digital image data based on color transitions include: receiving or capturing a digital image depicting an object; sampling color information from a first plurality of pixels of the digital image, wherein each of the first plurality of pixels is located in a background region of the digital image; assigning each pixel a label of either foreground or background using an adaptive label learning process; binarizing the digital image based on the labels assigned to each pixel; detecting contour(s) within the binarized digital image; and defining edge(s) of the object based on the detected contour(s). Corresponding systems and computer program products configured to perform the inventive methods are also described.

METAMODELING FOR CONFIDENCE PREDICTION IN MACHINE LEARNING BASED DOCUMENT EXTRACTION

A document extraction system executed by a processor, may process documents using manual and automated systems. The document extraction system may efficiently route tasks to the manual and automated systems based on a predicted probability that the results generated by the automated system meet some baseline level of accuracy. To increase document processing speed, documents having a high likelihood of accurate automated processing may be routed to an automated system. To ensure a baseline level of accuracy, documents having a smaller likelihood of accurate automated processing may be routed to a manual system.

SEMANTIC REPRESENTATION OF TEXT IN DOCUMENT
20230206670 · 2023-06-29 ·

There is provided a solution for semantic representation of text in a document. In this solution, textual information comprising a sequence of text elements (220) and layout information (230) of the text element are determined from a document. The layout information (230) indicates a spatial arrangement of the plurality of text elements (220) presented within the document. Based at least in part on the plurality of text elements (220) and the layout information (230), respective semantic feature representations (180) of the plurality of text elements (220) are generated. By jointly using both the textual information and the layout information (230), rich semantics of the text elements (220) in the document can be effectively captured in the feature representations.