Patent classifications
G06V30/19107
Optimization and use of codebooks for document analysis
A method of generating and optimizing a codebooks for document analysis comprises: receiving a first set of document images; extracting a plurality of keypoint regions from each document image of the first set of document images; calculating local descriptors for each keypoint region of the extracted keypoint regions; clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word; generating a codebook containing a set of visual words; and optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words.
Method for Generating Regions of Interest Based on Data Extracted from Navigational Charts
A method for extracting data from a single-layer raster navigational chart (RNC) comprising: using a computer vision algorithm to extract color, text and symbol data from the RNC, storing the color, text, and symbol data in a database, and building an RNC data vector based solely on the color, text, and symbol data of the RNC, wherein the RNC data vector identifies geographical features shown in the RNC and a location of the geographical features' corresponding pixels in the RNC; and drawing a region of interest on the navigational chart based on user input and the RNC data vector, wherein a perimeter of the region of interest is georeferenced with latitude and longitude information.
ACTIVITY CLASSIFICATION USING UNSUPERVISED MACHINE LEARNING
Systems and methods include acquisition of a first image of a first activity record, determination of first text based on the first image, generation of a first embedding based on the first text, generation of a second embedding based on the first embedding using a first model, where a number of dimensions of the second embedding is less than a number of dimensions of the first embedding, determination of a first cluster based on the second embedding using a second trained model, the second trained model trained using unsupervised learning, and determination of a first activity type associated with the first activity record based on the first cluster, the second embedding and historical activity data associating the first cluster with a plurality of activity types and each of the plurality of activity types with a respective embedding metric.
Intelligent data extraction system and method
A system and method for automating and improving data extraction from a variety of document types, including both unstructured, structured, and nested content, is disclosed. The system and method incorporate an intelligent machine learning model that is designed to intelligently identify chunks of text, map the fields in the document, and extract multi-record values. The system is designed to operate with little to no human intervention, while offering significant gains in accuracy, data visualization, and efficiency. The architecture applies customized techniques including density-based adaptive text clustering, tabular data extraction based on hierarchical intelligent keyword searches, and natural language processing-based field value selection.
Failure mode discovery for machine components
The failure modes of mechanical components may be determined based on text analysis. For example, a word embedding may be determined based on a plurality of text documents that include a plurality of maintenance records characterizing failure of mechanical components. A vector representation for a particular maintenance record may then be determined based on the word embedding. Based on the vector representation, the particular maintenance record may then be identified as belonging to a particular failure mode out of a set of possible failure modes.
Systems and methods for short text similarity based clustering
Methods and systems for receiving a plurality of documents including short text data and determining a plurality of forward similarity values based on the short text data in each of the plurality of documents, determining a plurality of reverse similarity values based on the short text data in each of the plurality of documents, generating a forward and reverse similarity matrix based on the plurality of forward similarity values and the plurality of reverse similarity values, and generating a plurality of short text similarity based clusters to group the short text data of the plurality of documents based on the forward and reverse similarity matrix.
SYSTEM AND METHOD FOR PERFORMING OPTICAL CHARACTER RECOGNITION
Techniques including a system and method for optical character recognition. The techniques may involve the use of a system. The system may include a plurality of optical character recognition engines configured to process, in parallel, at least one document or portion thereof, and produce output results for each of the optical character recognition engines. The system may include a component adapted to combine the output results of each of the optical character recognition engines and produce a single unified view of the at least one document or portion thereof.
AUTOMATIC TEMPLATE RECOMMENDATION
Embodiments are disclosed for providing customizable, visually aesthetic color diverse template recommendations derived from a source image. A method may include receiving a source image and determining a source image background by separating a foreground of the source image from a background of the source image. The method separates a foreground from the background by identifying portions of the image that belong to the background and stripping out the rest of the image. The method includes identifying a text region of the source image using a machine learning model and identifying font type using the identified text region. The method includes generating an editable template image using the source image background, the text region, and the font type.
REPORT TEMPLATE GENERATION BASED ON USER INTENT
The present disclosure relates generally to tools to determine a user's intent and, more particularly, to a system, method and computer program product to generate a report template based on user's intent. The method includes: extracting, by a computer system, text and user selected features from one or more reports built in a reporting application; classifying, by the computer system, keywords in the text and the select features; identifying, by the computer system, common keywords and associated selected features within the one or more reports; determining, by the computer system, an intent of the user based on the common keywords and associated selected features; and generating, by the computer system, a report template with prepopulated features of the selected features based on the intent of the user.
Positioning method and apparatus
A positioning method includes clustering points in a first point cloud through multi-clustering to obtain a target point cloud, where the target point cloud represents a feature of a target object, and the first point cloud includes the target point cloud and a point cloud that represents a feature of an interfering object; and determining a position of the target object based on the target point cloud.