Patent classifications
G06V30/42
Compositional pipeline for generating synthetic training data for machine learning models to extract line items from OCR text
Systems and methods of generating synthetic training data for machine learning models. First, line items in source documents such as bills, invoices, and or receipts are identified and labeled. The identification and labeling generate labeled documents. Then, in the labeled documents, the line items are augmented by adding, deleting, and or swapping line items to generate synthetic training documents. An addition operation randomly selects one or more line items and adds the selected line item(s) to the same labeled document or another labeled document. A deletion operation randomly deletes one or more line items. A swapping operation randomly swaps line items in a single labeled document or across different labeled documents. These operations can generate synthetic labeled documents of any length, which form synthetic training data for training the machine learning models.
Compositional pipeline for generating synthetic training data for machine learning models to extract line items from OCR text
Systems and methods of generating synthetic training data for machine learning models. First, line items in source documents such as bills, invoices, and or receipts are identified and labeled. The identification and labeling generate labeled documents. Then, in the labeled documents, the line items are augmented by adding, deleting, and or swapping line items to generate synthetic training documents. An addition operation randomly selects one or more line items and adds the selected line item(s) to the same labeled document or another labeled document. A deletion operation randomly deletes one or more line items. A swapping operation randomly swaps line items in a single labeled document or across different labeled documents. These operations can generate synthetic labeled documents of any length, which form synthetic training data for training the machine learning models.
Methods and systems for determining the authenticity of an identity document
A method for determining the authenticity of an identity document is provided that includes capturing, by an electronic device, image data of an identity document, determining a class of the identity document, and extracting, using multi-resolution convolution and octave convolution techniques, first and second frequency components from the captured image data. The first and second frequency components correspond to different spatial frequency ranges. Moreover, the method includes determining whether the first and second frequency components satisfy matching criteria with data in corresponding frequency maps. The frequency maps are created from verified documents belonging to the determined class of document. In response to determining at least one of the first and second frequency components satisfies the matching criteria, determining the identity document is genuine.
Methods and systems for determining the authenticity of an identity document
A method for determining the authenticity of an identity document is provided that includes capturing, by an electronic device, image data of an identity document, determining a class of the identity document, and extracting, using multi-resolution convolution and octave convolution techniques, first and second frequency components from the captured image data. The first and second frequency components correspond to different spatial frequency ranges. Moreover, the method includes determining whether the first and second frequency components satisfy matching criteria with data in corresponding frequency maps. The frequency maps are created from verified documents belonging to the determined class of document. In response to determining at least one of the first and second frequency components satisfies the matching criteria, determining the identity document is genuine.
READING AND RECOGNIZING HANDWRITTEN CHARACTERS TO IDENTIFY NAMES USING NEURAL NETWORK TECHNIQUES
A system and method for identifying handwritten characters on an image using a classification model that employs a neural network. The system includes a computer having a processor and a memory device that stores data and executable code that, when executed, causes the processor to read and convert typed text on the image to machine encoded text to identify locations of the typed text on the image; identify a location on the image that includes handwritten text based on the location of predetermined typed text on the image; identify clusters of non-white pixels in the image at the location having the handwritten text; generate an individual and separate cluster image for each identified cluster; classify each cluster image using machine learning and at least one neural network to determine the likelihood that the cluster is a certain character; and determine what character each cluster image is based on the classification.
READING AND RECOGNIZING HANDWRITTEN CHARACTERS TO IDENTIFY NAMES USING NEURAL NETWORK TECHNIQUES
A system and method for identifying handwritten characters on an image using a classification model that employs a neural network. The system includes a computer having a processor and a memory device that stores data and executable code that, when executed, causes the processor to read and convert typed text on the image to machine encoded text to identify locations of the typed text on the image; identify a location on the image that includes handwritten text based on the location of predetermined typed text on the image; identify clusters of non-white pixels in the image at the location having the handwritten text; generate an individual and separate cluster image for each identified cluster; classify each cluster image using machine learning and at least one neural network to determine the likelihood that the cluster is a certain character; and determine what character each cluster image is based on the classification.
DISTRIBUTED LEARNING METHOD, SERVER AND APPLICATION USING IDENTIFICATION CARD RECOGNITION MODEL, AND IDENTIFICATION CARD RECOGNITION METHOD USING THE SAME
A distributed learning method of a server managing an ID card recognition model includes releasing an ID card recognition model performing at least one convolution operation on an ID card image captured in a user terminal so that the user terminal uses the ID card recognition model, receiving update information of the ID card recognition model generated according to an ID card recognition result of the released ID card recognition model, and verifying the update information received from the user terminal and updating the ID card recognition model using the verified update information.
DISTRIBUTED LEARNING METHOD, SERVER AND APPLICATION USING IDENTIFICATION CARD RECOGNITION MODEL, AND IDENTIFICATION CARD RECOGNITION METHOD USING THE SAME
A distributed learning method of a server managing an ID card recognition model includes releasing an ID card recognition model performing at least one convolution operation on an ID card image captured in a user terminal so that the user terminal uses the ID card recognition model, receiving update information of the ID card recognition model generated according to an ID card recognition result of the released ID card recognition model, and verifying the update information received from the user terminal and updating the ID card recognition model using the verified update information.
Automatic rule prediction and generation for document classification and validation
A method is provided. The method may include, in response to electronically receiving a document, automatically classifying the document and different parts of the document, by electronically identifying a document type associated with the document and electronically tagging data associated with the different parts of the document based on classification rules. The method may further include automatically extracting the tagged data associated with the automatically classified document based on data extraction rules. The method may further include detecting first feedback associated with the classification rules and second feedback associated with the data extraction rules. The method may further include automatically generating and updating validation rules based on the identified document type, the detected first feedback, and the detected second feedback to validate the automatically classified document and the automatically tagged and extracted data.
Computer systems and computer-implemented methods utilizing a digital asset generation platform for classifying data structures
The techniques described herein relate to a systems and methods for a digital asset generation platform. The digital asset generation platform may ingest an ingest input. The digital asset generation platform may utilize a document identification engine corresponding to a first stage of a multi-stage convolutional neural network for identifying document types of documents. The digital asset generation platform may utilize an object detector engine corresponding to a second stage of the multi-stage convolutional neural network for detecting a dynamic mapping in the digital file. The digital asset generation platform may utilize a post-processing engine for classifying the dynamic mapping in the at least one digital file. The digital asset generation platform may dynamically generate a digital asset representative of the document based on the key value data pairs extracted from the dynamic mapping.