Patent classifications
G06V30/19093
INFORMATION PROCESSING APPARATUS, NON-TRANSITORY COMPUTER READABLE MEDIUM, AND METHOD FOR PROCESSING INFORMATION
An information processing apparatus includes a processor configured to: receive, from a user, first check processing information that is set for an item on a first form and that indicates first check processing relating to a check on information written on the first form as the item; obtain second check processing information that is check processing information indicating same check processing as the first check processing and that indicates second check processing relating to a check on information written, as an item, on a second form, which is different from the first form; propose unification of the first check processing information and the second check processing information; allow the user to determine whether to unify the first check processing information and the second check processing information with each other; and reflect, if the user determines that the first check processing information and the second check processing information are to be unified with each other and at least a part of the check processing indicated by the first check processing information or the second check processing information is changed, change information indicating the changed part in another of the first check processing information and the second check processing information.
Computer-readable recording medium storing training data generation program, training data generation method, and training data generation apparatus
A non-transitory computer-readable recording medium storing a training data generation program for causing a computer to execute processing including: identifying, from among meta-analysis literatures stored in a memory, a plurality of meta-analysis literatures in which a first literature is cited; determining a degree of similarity between the plurality of identified meta-analysis literatures based on feature information of the plurality of identified meta-analysis literatures; and in response to the degree of similarity being equal to or higher than a threshold, generating training data for machine learning including the first literature.
INFORMATION PROCESSING APPARATUS, SETTING METHOD, INSPECTION SYSTEM, AND MEDIUM
An information processing apparatus is provided. The apparatus registers, in a candidate list, in association with an inspection region of the reference image set on the setting screen, font data as a candidate of reference font data as a reference for character recognition at the time of inspection of a printed product, and sets the reference font data on the setting screen in accordance with a user selection from the candidate list. In the registering, among preregistered registration font data, font data whose similarity with a character string of the inspection region of the reference image exceeds a threshold is registered in the candidate list.
SYSTEM AND METHOD FOR MANAGING INVOICE EXCEPTIONS
A method and system for detecting deviation between invoices and receipts are disclosed. In some embodiments, the method includes receiving invoice data and receipt data. The method includes filtering the received data to generate filtered data. The method includes performing line-level matching on the filtered data based on one or more line-level attributes and one or more distance based algorithms. The method then includes determining, from the line-level matching, matched line items and unmatched line items between each pair of the invoice and receipts. The method also includes calculating one or more types of claims for both the matched line items and the unmatched line items to measure a total deviation between the invoices and receipts. The method further includes determining a level of match between the invoices and receipts and generating a recommended matching pair of invoice and receipt based on the level of match. The matches are further improved by user feedback to the recommended pairs which is used to train a machine learning model.
System and method for data drift detection
Exemplary systems and methods to extract, transform, and save to memory features from a training and a test dataset at extraction layers in a machine-learning model. For each data element in the training dataset, at each extraction layer: feature maps are created and grouped by k unique data labels to construct a set of k class-conditional distributions. For each data element in the datasets: distance sets between each feature map of each extraction layer and the extraction layer's class-conditional distributions are calculated and reduced to distance summary metrics. A drift test statistic for each extraction layer is computed between the datasets by comparing the extraction layer's distance summary metric distributions of the test dataset to distance summary metric distributions of the training dataset. The measure of drift between the datasets is computed by combining the test statistics of the extraction layers through a mathematical transform.
SYSTEM AND METHOD TO GENERATE SETS OF SIMILAR ASSESSMENT PAPERS
A system for generating a second set of similar assessment papers, from a first set of assessment papers is disclosed. The system includes an identification module, a test paper similarity module and threshold indicator module. The identification module is configured for identifying a plurality of meta-tagged assessment papers based on a numerical representation of each assessment paper of the first set of assessment papers. The test paper similarity module is configured for comparing the numerical representation of each of the identified assessment papers with the numerical representations of each of the other identified assessment papers, for assigning a numerical score to each such possible pair of the identified assessment papers. The threshold indicator module is configured for clustering the identified assessment papers into the second set of similar assessment papers having the numerical score greater than a predetermined threshold, for generating the second set of similar assessment papers.
IMAGE PROCESSING APPARATUS AND IMAGE PROCESSING METHOD
An image processing apparatus includes an extraction portion and a fusion portion. The extraction portion extracts a first object from each of a plurality of pieces of image data that include the first object and a second object, the first object including a handwritten object, the second object including a non-handwritten object. The fusion portion generates a fusion image by fusing a plurality of first objects extracted from the plurality of pieces of image data into the second object that is common to the plurality of pieces of image data.
Method for identifying entity data in a data set
A data processing system receives a plurality of electronic documents in image format, and extracts text data using an optical character recognition processor. The system determines a plurality of candidate entity data and candidate context data based on the extracted text data using a trained natural language processing closed-domain question answering model. The system accesses n-gram words stored in a knowledge base, and determines similarity scores between each candidate context data and each of the n-gram words. The system determines a weighted average of the similarity scores, and selects an optimum entity data from the plurality of candidate entity data based on the weighted average of the similarity scores.
Image processing apparatus and image processing method
An image processing apparatus includes an extraction portion and a fusion portion. The extraction portion extracts a first object from each of a plurality of pieces of image data that include the first object and a second object, the first object including a handwritten object, the second object including a non-handwritten object. The fusion portion generates a fusion image by fusing a plurality of first objects extracted from the plurality of pieces of image data into the second object that is common to the plurality of pieces of image data.
Multi-dimensional table reproduction from image
Embodiments facilitate selection and assignment of a known user model, based upon input comprising table images of original data. A table engine receives the image and performs pre-processing (e.g., rasterization, Optical Character Recognition, coordinate representation) thereupon to identify image entities. After filtering original numerical data, a similarity (e.g., a distance) is calculated between an image entity and a dimension member of the known user model. Based upon this similarity, the table engine selects and assigns the known user model to the incoming tables images, generating a file representing table columns and rows. This file is received at the UI of an analytics platform, which in turn populates the model with data of the user (rather than the original data) via an API. Embodiments may be particularly valuable in allowing a user to rapidly generate multi-dimensional tables comprising their own data, based upon raw table images received from an external party.