G06V30/19107

IMAGE ANALYSIS OF DATA LOGS

Systems, methods, and software for analyzing data logs. In one embodiment, a method comprises collecting a plurality of the data logs from log-generating elements, converting the data logs into log images, performing image analysis on a plurality of the log images to extract insights, and generating an output based on the insights.

SYSTEMS AND METHODS FOR SHORT TEXT SIMILARITY BASED CLUSTERING
20230267281 · 2023-08-24 ·

Methods and systems for receiving a plurality of documents including short text data and determining a plurality of forward similarity values based on the short text data in each of the plurality of documents, determining a plurality of reverse similarity values based on the short text data in each of the plurality of documents, generating a forward and reverse similarity matrix based on the plurality of forward similarity values and the plurality of reverse similarity values, and generating a plurality of short text similarity based clusters to group the short text data of the plurality of documents based on the forward and reverse similarity matrix.

TABLE EXTRACTION FROM IMAGE-BASED DOCUMENTS

Techniques are described for extracting tables and associated content from image-based documents and generating a machine-readable representation of a table. A system is described that executes an end-to-end pipeline for extracting one or more tables from an image-based documents and generating a machine-readable and editable table representation based upon the extracted contents. The processing may include using OCR techniques to extract text portions from an image-based document, identifying a region (table region) in the image-based document containing a table, identifying a subset of text portions that are located inside the table region, determining a number of rows and columns in the table to be generated, aligning the text portions and assigning row and column indices to the text portions, and generating a machine-readable table representation based upon the text portions.

Domain-Specific Phrase Mining Method, Apparatus and Electronic Device
20220138424 · 2022-05-05 ·

A domain-specific phrase mining method, apparatus and electronic device are provided. A specific implementation includes: performing word vector conversion on a domain-specific phrase in a target text to obtain a first word vector, and performing word vector conversion on an unknown phrase in the target text to obtain a second word vector, where the domain-specific phrase is a phrase in a domain to which the target text belongs; obtaining a word vector space formed by the first and second word vectors, and identifying a preset quantity of target word vectors around the second word vector in the word vector space; determining, based on similarity values indicative of similarity between the preset quantity of target word vectors and the second word vector, whether the unknown phrase is a phrase in the domain to which the target text belongs.

Systems and methods to process electronic images to provide image-based cell group targeting

Systems and methods are disclosed for grouping cells in a slide image that share a similar target, comprising receiving a digital pathology image corresponding to a tissue specimen, applying a trained machine learning system to the digital pathology image, the trained machine learning system being trained to predict at least one target difference across the tissue specimen, and determining, using the trained machine learning system, one or more predicted clusters, each of the predicted clusters corresponding to a subportion of the tissue specimen associated with a target.

System and method to extract information from unstructured image documents

The present disclosure relates to a system and method to extract information from unstructured image documents. The extraction technique is content-driven and not dependent on the layout of a particular image document type. The disclosed method breaks down an image document into smaller images using the text cluster detection algorithm. The smaller images are converted into text samples using optical character recognition (OCR). Each of the text samples is fed to a trained machine learning model. The model classifies each text sample into one of a plurality of pre-determined field types. The desired value extraction problem may be converted into a question-answering problem using a pre-trained model. A fixed question is formed on the basis of the classified field type. The output of the question-answering model may be passed through a rule-based post-processing step to obtain the final answer.

Computer-readable recording medium storing training data generation program, training data generation method, and training data generation apparatus
11769339 · 2023-09-26 · ·

A non-transitory computer-readable recording medium storing a training data generation program for causing a computer to execute processing including: identifying, from among meta-analysis literatures stored in a memory, a plurality of meta-analysis literatures in which a first literature is cited; determining a degree of similarity between the plurality of identified meta-analysis literatures based on feature information of the plurality of identified meta-analysis literatures; and in response to the degree of similarity being equal to or higher than a threshold, generating training data for machine learning including the first literature.

DETECTING GRAPHICAL ELEMENTS IN CHARTS USING PREDICTED HEATMAPS
20230298373 · 2023-09-21 ·

An example system includes a processor to receive detected chart regions in a page of a document. The processor is to produce, via a graphical elements detector, predicted heatmaps and bounding boxes for graphical objects in the detected chart regions. The processor is also to apply chart type specific analysis algorithm to the predicted heatmaps and bounding boxes, to extract tabular chart data. The processor can then generate an output data file and a visualization based on the predicted heatmap and the extracted tabular chart data.

Method and System for Detecting Drift in Text Streams

Methods and systems disclosed herein may quantify the content and nature of first streaming data to detect when the typical composition of the first streaming data changes. Quantifying the content and nature of the first streaming data may begin by generating a baseline representation of the content of the first streaming data as represented by a first matrix. Once generated, the first matrix may be used as a control against subsequently received data streams. In this regard, a second matrix may be generated from second streaming data and compared to the first matrix to determine the differences between the first streaming data and the second streaming data. Once a difference is determined, the difference may be compared to a threshold value and, when the difference exceeds the threshold value, an administrator may be notified and corrective action taken.

Positioning Method and Apparatus
20220019845 · 2022-01-20 ·

A positioning method includes clustering points in a first point cloud through multi-clustering to obtain a target point cloud, where the target point cloud represents a feature of a target object, and the first point cloud includes the target point cloud and a point cloud that represents a feature of an interfering object; and determining a position of the target object based on the target point cloud.