Patent classifications
G06V30/19107
CONTINUOUS MACHINE LEARNING METHOD AND SYSTEM FOR INFORMATION EXTRACTION
Methods and systems for artificial intelligence (AI)-assisted document annotation and training of machine learning-based models for document data extraction are described. The methods and systems described herein take advantage of a continuous machine learning approach to create document processing pipelines that provide accurate and efficient data extraction from documents that include structured text, semi-structured text, unstructured text, or any combination thereof.
DESIGN OPTIMIZATION AND USE OF CODEBOOKS FOR DOCUMENT ANALYSIS
A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.
LINE ITEM DETECTION IN BORDERLESS TABULAR STRUCTURED DATA
In an approach, a processor identifies a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table. A processor classifies the plurality of text separators into a number of target clusters comprised in a target group based on property information related to the plurality of text separators, the number of target clusters corresponding to a number of separator types. A processor provides indication information to indicate respective separator types of the plurality of text separators based on a result of the classifying.
DATA CLASSIFICATION BASED ON RECURSIVE CLUSTERING
Methods and systems are presented for providing a machine learning model framework configured to perform complex data classifications. Upon receiving a request for classifying data, the data is recursively assigned to one or more clusters. During each iteration of clustering assignment, a set of clusters is selected based on a previously assigned cluster for the data, and the data is then assigned to a particular cluster from the selected set of clusters. The machine learning model framework also includes a plurality of machine learning models configured to perform simple data classifications. A particular machine learning model is selected from the plurality of machine learning model based on the one or more clusters to which the document is assigned. The particular machine learning model is then used to classify the document.
TRANSPARENCY DETECTION METHOD BASED ON MACHINE VISION
Disclosed is a transparency detecting method based on machine vision. The transparency detecting method includes 1) operating a Secchi disk to start the water transparency measurement, and turning on the camera for shooting; 2). determining a critical position of the Secchi disk; 3) identifying a water ruler and calculating a reading of the water ruler; 4) outputting and displaying the calculated reading.
PRODUCT IDENTIFICATION ASSISTANCE TECHNIQUES IN AN ELECTRONIC MARKETPLACE APPLICATION
A system for assisting a user in listing items for sale in an electronic marketplace via an electronic marketplace application is disclosed. A product identification technique for assisting the user in listing of the item for sale in the electronic marketplace is determined based on initial user input provided by the user. A prompt to provide additional user input is then displayed to the user in the user interface of the electronic marketplace application, where the additional user data corresponds to the determined product identification technique for assisting the user. A listing for the item is generated based on the additional input provided by the user, and the listing is displayed to the user in the user interface of the electronic marketplace application.
Article topic alignment
A method including: analyzing, by a computing device, a plurality of portions of a document; determining, by the computing device and based on the analyzing, a concept of each of the portions of the document; comparing, by the computing device, a title of the document with the concept of each of the portions of the document; determining, by the computing device and based on the comparing, an alignment of the concept of each of the portions of the document with the title; generating, by the computing device and based on the alignment, a propensity score for each of the portions of the document; and reordering, by the computing device and based on the propensity scores, the portions of the document from most aligned with the title to least aligned with the title.
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
To make it possible to extract character information with a high accuracy even from a document image obtained by reading a document in which a logo mark or the like overlaps a character portion. By performing binarization processing for a document image obtained by reading a document, a binary image including first pixels representing a color darker than a reference and second pixels representing a color paler than the reference is generated. Then, by changing the pixel among the first pixels included in the generated binary image, whose corresponding pixel's color in the document image is different from a color of a character object within the document, to the second pixel, a binary image in which a background object that overlaps the character object in the document image is removed is generated.
Failure mode discovery for machine components
The failure modes of mechanical components may be determined based on text analysis. For example, a word embedding may be determined based on a plurality of text documents that include a plurality of maintenance records characterizing failure of mechanical components. A vector representation for a particular maintenance record may then be determined based on the word embedding. Based on the vector representation, the particular maintenance record may then be identified as belonging to a particular failure mode out of a set of possible failure modes.
Detecting machine text
System receives historical text block, creates historical features for historical text block's historical text lines. System trains machine-learning model to cluster historical features into historical features clusters based on their similarities. System identifies historical features cluster as historical human text cluster. System classifies each historical text line for historical human text cluster as human text, and each historical text line for other historical features clusters as machine text. System receives text block, creates features for text block's text lines. System applies trained machine-learning model to cluster features into features clusters based on their similarities. System identifies features cluster as human text cluster. System classifies each text line for human text cluster as human text, and each text line for other features clusters as machine text. System applies human text analysis to each text line classified as human text and machine text analysis to each text line classified as machine text.