G06V30/18133

Text Line Detection
20230036812 · 2023-02-02 ·

Implementations of the present disclosure provide a solution for text line detection. In this solution, a first text region comprising a first portion of at least a first text element and a second text region comprising a second portion of at least a second text element are determined from an image. A first feature representation is extracted from the first text region and a second feature representation is extracted from the second text region. The first and second feature representations comprise at least one of an image eature representation or a semantic feature representation of the image. A link relationship between the first and second text regions can then be determined based at least in part on the first and second feature representations. The link relationship can indicate whether the first and second portions of the first and second text elements are located in a same text line. In this way, by detecting text regions and determining the link relationship thereof based on their feature representations, the accuracy and efficiency for detecting text lines in various images can be improved

Character Restoration Method and Apparatus, Storage Medium, and Electronic Device
20230063967 · 2023-03-02 ·

A character restoration method and apparatus, a storage medium, and an electronic device are provided. The character restoration method includes: a character identifier of a character in a text region is determined, where the character identifier is used for uniquely identifying the character; and encoding is performed at least according to the character identifier, and encoded data is sent to a receiving end, where the encoded data is used for the receiving end to decode the encoded data and restore the character according to the character identifier obtained after decoding, that is, encoding is performed merely according to a small amount of information, and then the information is obtained by decoding, so as to restore the character.

COMPUTER IMPLEMENTED METHOD FOR SEGMENTING A BINARIZED DOCUMENT
20220237932 · 2022-07-28 ·

A computer-implemented method is disclosed for segmenting a binarized document. The method includes extracting connected components from the binarized document and discriminating (for at least one of the connected components) whether it is a text component based on a homogeneity level value. The homogeneity level value is representative of the level of homogeneity within the local region of the connected component. The local region includes the connected component and at least one adjacent connected component. The homogeneity level value is based on at least one value representative of at least one image characteristic parameter determined for the connected component and on at least one value representative of the image characteristic parameter of the at least one adjacent connected component.

DATA PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
20230386238 · 2023-11-30 ·

This application discloses a data processing method and apparatus, a computer device, and a non-transitory computer-readable storage medium in the technical field of computers. This application, for textual data and picture data of an article, extracts a textual feature and a picture feature, respectively, and predicts an article classification to which the article belongs using a cross-modal interaction feature between the textual feature and picture feature. At the same time, this application considers the contribution degree of each of a textual modality and a picture modality to the article classification, rather than determining from a textual perspective only. In addition, the extracted cross-modal interaction feature is not a simple concatenation of the textual feature and the picture feature, which can reflect richer and deeper inter-modal interaction information, and greatly improve the identification accuracy of the article classification. Furthermore, it can improve the discovering accuracy of high-quality articles in the scene of identifying high-quality articles.

Semantic data type classification in rectangular datasets

Provided is a method, computer program product, and system for automatically predicting unknown semantic data types in a rectangular dataset using a holistic knowledge of said dataset. A processor may receive one or more rectangular datasets, the one or more rectangular datasets comprising a plurality of columns having a set of known semantic data types. The processor may extract a set of features from the plurality of columns, where the set of features is used to determine a relationship among each column of the plurality of columns. The processor may construct a set of training data based on the extracted set of features. Using the training data, the processor may train a machine learning model to predict a semantic data type of a target column in a rectangular dataset having an unknown semantic data type.

SEMANTIC DATA TYPE CLASSIFICATION IN RECTANGULAR DATASETS

Provided is a method, computer program product, and system for automatically predicting unknown semantic data types in a rectangular dataset using a holistic knowledge of said dataset. A processor may receive one or more rectangular datasets, the one or more rectangular datasets comprising a plurality of columns having a set of known semantic data types. The processor may extract a set of features from the plurality of columns, where the set of features is used to determine a relationship among each column of the plurality of columns. The processor may construct a set of training data based on the extracted set of features. Using the training data, the processor may train a machine learning model to predict a semantic data type of a target column in a rectangular dataset having an unknown semantic data type.

Character restoration method and apparatus, storage medium, and electronic device
11902522 · 2024-02-13 · ·

A character restoration method and apparatus, a storage medium, and an electronic device are provided. The character restoration method includes: a character identifier of a character in a text region is determined, where the character identifier is used for uniquely identifying the character; and encoding is performed at least according to the character identifier, and encoded data is sent to a receiving end, where the encoded data is used for the receiving end to decode the encoded data and restore the character according to the character identifier obtained after decoding, that is, encoding is performed merely according to a small amount of information, and then the information is obtained by decoding, so as to restore the character.

CIRCUIT BOARD PROCESSING SYSTEM USING LOCAL THRESHOLD VALUE IMAGE ANALYSIS
20240233423 · 2024-07-11 ·

A circuit board processing system configured to process a panel, the system comprising a character recognition system configured to obtain a digital image of a character imprinted on a surface of the panel. The character recognition system is further configured to perform image analysis on the digital image of the character, the image analysis including applying a local threshold value to one or more localities of the digital image to provide a binarized image of the character; and to classify the character based on the image analysis, the local threshold value applied to each locality of the one or more localities being based on pixel information around each respective locality.

CIRCUIT BOARD PROCESSING SYSTEM USING FRAGMENTED SEARCH REGION IMAGE ANALYSIS
20240233428 · 2024-07-11 ·

A circuit board processing system configured to process a panel, the system comprising a character recognition system configured to obtain a digital image of a character imprinted on a surface of the panel. The character recognition system is further configured to apply a fragmented search region to the digital image to obtain less than a whole of the character and apply image analysis to the less than a whole of the character; and to classify the character based on the image analysis.

Computer implemented method for segmenting a binarized document
12100233 · 2024-09-24 · ·

A computer-implemented method is disclosed for segmenting a binarized document. The method includes extracting connected components from the binarized document and discriminating (for at least one of the connected components) whether it is a text component based on a homogeneity level value. The homogeneity level value is representative of the level of homogeneity within the local region of the connected component. The local region includes the connected component and at least one adjacent connected component. The homogeneity level value is based on at least one value representative of at least one image characteristic parameter determined for the connected component and on at least one value representative of the image characteristic parameter of the at least one adjacent connected component.