G06V30/416

Automatic reminders in a mobile environment
11704136 · 2023-07-18 · ·

Systems and methods are provided for suggesting reminders from content displayed on a mobile device. An example method may include analyzing content generated by a first mobile application and displayed on a display of a mobile device, and determining that the content suggests an event, the event including at least one entity. The method may also include providing an assistance window requesting confirmation for adding a reminder for the event in a second mobile application responsive to determining that the content suggests the event, and adding the reminder via the second mobile application responsive to receiving the confirmation. In some implementations the first mobile application is a messaging application.

Machine-learning for enhanced machine reading of non-ideal capture conditions

Implementations of the present disclosure include receiving a training image, providing a hash pattern that is representative of the training image, applying a plurality of filters to the training image to provide a respective plurality of filtered training images, identifying a filter to be associated with the hash pattern based on the plurality of filtered training images, and storing a mapping of the filter to the hash pattern within a set of mapping in a data store.

Revealing content reuse using coarse analysis

Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime.

Revealing content reuse using coarse analysis

Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime.

Electronic document data extraction

Methods, systems, and computer storage media are provided for data extraction. A target document representation may be generated based on modified text of a target electronic document. A measure of similarity may be determined between the target document representation and a reference document representation, which may be based on modified text of a reference electronic document. Based on the measure of similarity, the reference document representation may be selected. An extraction model associated with the selected reference document representation can then be used to extract data from the target document.

Electronic document data extraction

Methods, systems, and computer storage media are provided for data extraction. A target document representation may be generated based on modified text of a target electronic document. A measure of similarity may be determined between the target document representation and a reference document representation, which may be based on modified text of a reference electronic document. Based on the measure of similarity, the reference document representation may be selected. An extraction model associated with the selected reference document representation can then be used to extract data from the target document.

Table item information extraction with continuous machine learning through local and global models

A bipartite application implements a table auto-completion (TAC) algorithm on the client side and the server side. A client module runs a local model of the TAC algorithm on a user device and a server module runs a global model of the TAC algorithm on a server machine. The local model is continuously adapted through on-the-fly training, with as few as a negative example, to perform TAC on the client side, one document at a time. Knowledge thus learned by the local model is used to improve the global model on the server side. The global model can be utilized to automatically and intelligently extract table information from a large number of documents with significantly improved accuracy, requiring minimal human intervention even on complex tables.

Table item information extraction with continuous machine learning through local and global models

A bipartite application implements a table auto-completion (TAC) algorithm on the client side and the server side. A client module runs a local model of the TAC algorithm on a user device and a server module runs a global model of the TAC algorithm on a server machine. The local model is continuously adapted through on-the-fly training, with as few as a negative example, to perform TAC on the client side, one document at a time. Knowledge thus learned by the local model is used to improve the global model on the server side. The global model can be utilized to automatically and intelligently extract table information from a large number of documents with significantly improved accuracy, requiring minimal human intervention even on complex tables.

Information processing apparatus and non-transitory computer readable medium storing program
11710333 · 2023-07-25 · ·

An information processing apparatus includes a processor configured to receive an input image including images of plural documents, execute detection of one or more items determined in advance as an item included in the document from the input image, and execute output processing of extracting and outputting the image of each document from the input image based on the detected one or more items.

Information processing apparatus and non-transitory computer readable medium storing program
11710333 · 2023-07-25 · ·

An information processing apparatus includes a processor configured to receive an input image including images of plural documents, execute detection of one or more items determined in advance as an item included in the document from the input image, and execute output processing of extracting and outputting the image of each document from the input image based on the detected one or more items.