G06V30/414

Text recognition for a neural network
11710304 · 2023-07-25 · ·

Image data having text associated with a plurality of text-field types is received, the image data including target image data and context image data. The target image data including target text associated with a text-field type. The context image data providing a context for the target image data. A trained neural network that is constrained to a set of characters for the text-field type is applied to the image data. The trained neural network identifies the target text of the text-field type using a vector embedding that is based on learned patterns for recognizing the context provided by the context image data. One or more predicted characters are provided for the target text of the text-field type in response to identifying the target text using the trained neural network.

Revealing content reuse using coarse analysis

Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime.

Revealing content reuse using coarse analysis

Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime.

Artificial intelligence based smart data engine

A machine learning computing system for extracting structured data objects from electronic documents comprising unstructured text includes a first data repository storing a plurality of electronic documents including at least one text data object and an expert system computing device. The expert system computing device includes a processor and a non-transitory memory device storing instructions causing the expert system to receive a first data object comprising unstructured data identified from an electronic document stored in the first data repository, process, a first set of rules to identify at least one key-value pair data object from the first data object; process, by an inference engine module, a second set of rules to identify at least one free text data object from the first data object and store, in a non-transitory memory device, the at least one key-value pair and the at least one free text data object.

Artificial intelligence based smart data engine

A machine learning computing system for extracting structured data objects from electronic documents comprising unstructured text includes a first data repository storing a plurality of electronic documents including at least one text data object and an expert system computing device. The expert system computing device includes a processor and a non-transitory memory device storing instructions causing the expert system to receive a first data object comprising unstructured data identified from an electronic document stored in the first data repository, process, a first set of rules to identify at least one key-value pair data object from the first data object; process, by an inference engine module, a second set of rules to identify at least one free text data object from the first data object and store, in a non-transitory memory device, the at least one key-value pair and the at least one free text data object.

SYSTEM AND METHOD FOR MATCHING TRANSACTION ELECTRONIC DOCUMENTS TO EVIDENCING ELECTRONIC DOCUMENTS
20180011846 · 2018-01-11 · ·

A system and method for matching a second electronic document to a first electronic document, the first electronic document including at least partially unstructured data of a transaction. The method includes: analyzing the at least partially unstructured data to determine at least one transaction parameter; creating a template for the first electronic document, wherein the template is a structured dataset including the determined at least one transaction parameter; determining, based on the template, a portion of the first electronic document requiring evidence; searching, based on the template, for a second electronic document, wherein the second electronic document indicates of the evidence-requiring portion; and associating the second electronic document with the first electronic document.

Electronic document data extraction

Methods, systems, and computer storage media are provided for data extraction. A target document representation may be generated based on modified text of a target electronic document. A measure of similarity may be determined between the target document representation and a reference document representation, which may be based on modified text of a reference electronic document. Based on the measure of similarity, the reference document representation may be selected. An extraction model associated with the selected reference document representation can then be used to extract data from the target document.

Electronic document data extraction

Methods, systems, and computer storage media are provided for data extraction. A target document representation may be generated based on modified text of a target electronic document. A measure of similarity may be determined between the target document representation and a reference document representation, which may be based on modified text of a reference electronic document. Based on the measure of similarity, the reference document representation may be selected. An extraction model associated with the selected reference document representation can then be used to extract data from the target document.

Image processing apparatus with automated registration of previously encountered business forms, image processing method and storage medium therefor
11710329 · 2023-07-25 · ·

The image processing apparatus has an obtaining unit configured to obtain a scanned image, a first determination unit configured to determine a document type of a document format similar to a document format of the scanned image based on information on each registered document type, an extraction unit configured to extract a character string corresponding to a predetermined item, a second determination unit configured to use a different method for determining whether the document format indicated by the scanned image is similar to the document format of the document type determined by the first determination unit in a case where a user modifies the extracted character string, and a display control unit configured to display a screen prompting the user to perform overwriting in a case where the second determination unit determines that the document format is similar.

Image processing apparatus with automated registration of previously encountered business forms, image processing method and storage medium therefor
11710329 · 2023-07-25 · ·

The image processing apparatus has an obtaining unit configured to obtain a scanned image, a first determination unit configured to determine a document type of a document format similar to a document format of the scanned image based on information on each registered document type, an extraction unit configured to extract a character string corresponding to a predetermined item, a second determination unit configured to use a different method for determining whether the document format indicated by the scanned image is similar to the document format of the document type determined by the first determination unit in a case where a user modifies the extracted character string, and a display control unit configured to display a screen prompting the user to perform overwriting in a case where the second determination unit determines that the document format is similar.