G06V30/418

Enterprise profile management and control system

Systems for profile management and control are provided. A system may receive an instrument or image of an instrument. In some examples, data may be extracted from the instrument or image of the instrument and a document profile may be retrieved based on the extracted data. Images within the document profile may be evaluated to identify a type of document for each document. In some examples, a total number of documents of each type may be determined or identified. The total number of documents may be compared to a threshold. If the total number of documents is below the threshold, the documents or images in the profile may be maintained. If the total number of documents is at or above the threshold, in some examples, each document may be further evaluated to determine or identify documents or document images for deletion. In some arrangements, the profile may be refreshed and documents or images identified for deletion may be deleted.

Techniques for image content extraction

Embodiments are directed to techniques for image content extraction. Some embodiments include extracting contextually structured data from document images, such as by automatically identifying document layout, document data, document metadata, and/or correlations therebetween in a document image, for instance. Some embodiments utilize breakpoints to enable the system to match different documents with internal variations to a common template. Several embodiments include extracting contextually structured data from table images, such as gridded and non-gridded tables. Many embodiments are directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. Several embodiments are directed to automatically identifying and associating document metadata with corresponding document data in a document image to generate a machine-facilitated annotation of the document image. In some embodiments, the machine-facilitated annotation may be used to generate a template for the template database.

METHODS AND SYSTEMS FOR DETERMINING AUTHENTICITY OF A DOCUMENT
20230222826 · 2023-07-13 ·

A method for determining authenticity of a document is provided that includes receiving, by an electronic device, an image of a document, assigning a label to the image, and obtaining vectors for each image in a subset of images. Each image is of a document and is assigned the same label as the received image. Moreover, the method includes encoding the received image into a vector, calculating a distance between the vector of the received image and each obtained vector, comparing each of the calculated distances against a threshold distance, and calculating a number of the calculated distances that are less than or equal to the threshold distance. In response to determining the calculated number is at least equal to a required number, the document in the received image is determined to be authentic. Otherwise, the received image requires manual review.

Text document categorization using rules and document fingerprints

Methods, apparatuses, and storage media storing instructions for classifying text documents are provided. A plurality of text documents is obtained. The plurality of text documents is classified into one or more document categories based on a plurality of classification rules. Each of the one or more document categories include one or more first text documents of the plurality of text documents. A second text document of the plurality of text documents is classified based on the plurality of classification rules as belonging to none of the one or more document categories. One or more document fingerprints are generated for respective first text documents in the one or more document categories. The second text document is classified into one of the one or more document categories based on the one or more document fingerprints.

Text document categorization using rules and document fingerprints

Methods, apparatuses, and storage media storing instructions for classifying text documents are provided. A plurality of text documents is obtained. The plurality of text documents is classified into one or more document categories based on a plurality of classification rules. Each of the one or more document categories include one or more first text documents of the plurality of text documents. A second text document of the plurality of text documents is classified based on the plurality of classification rules as belonging to none of the one or more document categories. One or more document fingerprints are generated for respective first text documents in the one or more document categories. The second text document is classified into one of the one or more document categories based on the one or more document fingerprints.

Performing secondary copy operations based on deduplication performance

An improved information management system is described herein in which the information management system can evaluate the deduplication performance of secondary copy operations and dynamically adjust the manner in which secondary copy data is created to minimize the negative effects of performing deduplication. Furthermore, the improved information management system can improve deduplication performance by applying different storage policies to different types of applications running on a client computing device. Moreover, the improved information management system can automatically detect the region of a client computing device and apply an appropriate information management policy to the client computing device to avoid inconsistencies or other errors resulting from administrator control.

NON-FACTOID QUESTION ANSWERING ACROSS TASKS AND DOMAINS

An approach for a non-factoid question answering framework across tasks and domains may be provided. The approach may include training a multi-task joint learning model in a general domain. The approach may also include initializing the multi-task joint learning model in a specific target domain. The approach may include tuning the joint learning model in the target domain. The approach may include determining which task of the multiple tasks is more difficult for the multi-task joint learning model to learn. The approach may also include dynamically adjusting the weights of the multi-task joint learning model, allowing the model to concentrate on learning the more difficult learning task.

SYSTEMS AND METHODS FOR CLASSIFYING DOCUMENTS
20230214428 · 2023-07-06 ·

A system may iteratively scan a portion of a document, extract first data from the portion of the document, and determine, using a trained model, whether the first data corresponds to one or more document types based on one or more confidence thresholds. The system may repeat this process, increasing the portion of the document scanned by a predetermined amount each iteration, until the first data corresponds to the one or more document types based on the one or more confidence thresholds. Responsive to determining the first data corresponds to the one or more document types based on the one or more confidence thresholds, the system may cause a graphical user interface (GUI) of a user device to display a notification indicating a document type match.

SYSTEMS AND METHODS FOR CLASSIFYING DOCUMENTS
20230214428 · 2023-07-06 ·

A system may iteratively scan a portion of a document, extract first data from the portion of the document, and determine, using a trained model, whether the first data corresponds to one or more document types based on one or more confidence thresholds. The system may repeat this process, increasing the portion of the document scanned by a predetermined amount each iteration, until the first data corresponds to the one or more document types based on the one or more confidence thresholds. Responsive to determining the first data corresponds to the one or more document types based on the one or more confidence thresholds, the system may cause a graphical user interface (GUI) of a user device to display a notification indicating a document type match.

Systems and methods for generating document numerical representations

Described embodiments relate to a method comprising: determining a candidate document comprising image data and character data and extracting the image data and the character data from the candidate document. The method comprises providing, to an image-based numerical representation generation model, the image data, and generating, by the image-based numerical representation generation model, an image-based numerical representation of the image data. The method comprises providing, to a character-based numerical representation generation model, the character data; and generating, by the character-based numerical representation generation model, a character-based numerical representation of the character data. The method comprises providing, to a consolidated image-character based numerical representation generation model, the image-based numerical representation and the character-based numerical representation; and generating, by the consolidated image-character based numerical representation generation model, a combined image-character based numerical representation of the candidate document.