G06V30/414

Table item information extraction with continuous machine learning through local and global models

A bipartite application implements a table auto-completion (TAC) algorithm on the client side and the server side. A client module runs a local model of the TAC algorithm on a user device and a server module runs a global model of the TAC algorithm on a server machine. The local model is continuously adapted through on-the-fly training, with as few as a negative example, to perform TAC on the client side, one document at a time. Knowledge thus learned by the local model is used to improve the global model on the server side. The global model can be utilized to automatically and intelligently extract table information from a large number of documents with significantly improved accuracy, requiring minimal human intervention even on complex tables.

Table item information extraction with continuous machine learning through local and global models

A bipartite application implements a table auto-completion (TAC) algorithm on the client side and the server side. A client module runs a local model of the TAC algorithm on a user device and a server module runs a global model of the TAC algorithm on a server machine. The local model is continuously adapted through on-the-fly training, with as few as a negative example, to perform TAC on the client side, one document at a time. Knowledge thus learned by the local model is used to improve the global model on the server side. The global model can be utilized to automatically and intelligently extract table information from a large number of documents with significantly improved accuracy, requiring minimal human intervention even on complex tables.

Information processing apparatus and non-transitory computer readable medium storing program
11710333 · 2023-07-25 · ·

An information processing apparatus includes a processor configured to receive an input image including images of plural documents, execute detection of one or more items determined in advance as an item included in the document from the input image, and execute output processing of extracting and outputting the image of each document from the input image based on the detected one or more items.

Information processing apparatus and non-transitory computer readable medium storing program
11710333 · 2023-07-25 · ·

An information processing apparatus includes a processor configured to receive an input image including images of plural documents, execute detection of one or more items determined in advance as an item included in the document from the input image, and execute output processing of extracting and outputting the image of each document from the input image based on the detected one or more items.

Method and apparatus for automatically extracting information from unstructured data

Various methods, apparatuses/systems, and media for automatically extracting information from unstructured data are provided. A receiver receives digitized data of a document having unstructured data format. A processor applies machine learning models for sectioning the digitized data. An OCR device applies an OCR processing to the sectioned digitized data. The processor matches the sectioned digitized data to patterns and rules; applies classification models to the matched digitized data to identify entities and events from the sectioned digitized data; automatically link each entity with corresponding event in a hierarchical format to generate a document having structured data format; and output the document having the structured data with metadata having the linked entity with corresponding event in the hierarchical format to downstream applications.

Method and apparatus for automatically extracting information from unstructured data

Various methods, apparatuses/systems, and media for automatically extracting information from unstructured data are provided. A receiver receives digitized data of a document having unstructured data format. A processor applies machine learning models for sectioning the digitized data. An OCR device applies an OCR processing to the sectioned digitized data. The processor matches the sectioned digitized data to patterns and rules; applies classification models to the matched digitized data to identify entities and events from the sectioned digitized data; automatically link each entity with corresponding event in a hierarchical format to generate a document having structured data format; and output the document having the structured data with metadata having the linked entity with corresponding event in the hierarchical format to downstream applications.

ENHANCING DOCUMENTS PORTRAYED IN DIGITAL IMAGES

The present disclosure is directed toward systems and methods that efficiently and effectively generate an enhanced document image of a displayed document in an image frame captured from a live image feed. For example, systems and methods described herein apply a document enhancement process to a displayed document in an image frame that result in an enhanced document image that is cropped, rectified, un-shadowed, and with dark text against a mostly white background. Additionally, systems and method described herein determine whether a stored digital content item includes a displayed document. In response to determining that a stored digital content item does include a displayed document, systems and methods described herein generate an enhanced document image of a displayed document included in the stored digital content item.

ENHANCING DOCUMENTS PORTRAYED IN DIGITAL IMAGES

The present disclosure is directed toward systems and methods that efficiently and effectively generate an enhanced document image of a displayed document in an image frame captured from a live image feed. For example, systems and methods described herein apply a document enhancement process to a displayed document in an image frame that result in an enhanced document image that is cropped, rectified, un-shadowed, and with dark text against a mostly white background. Additionally, systems and method described herein determine whether a stored digital content item includes a displayed document. In response to determining that a stored digital content item does include a displayed document, systems and methods described herein generate an enhanced document image of a displayed document included in the stored digital content item.

METHOD AND APPARATUS FOR DETECTING ANOMALIES IN MISSION CRITICAL ENVIRONMENTS

A method including isolating a protocol language of a data set comprising a text structure representing data regarding a network communication procedure between a plurality of user devices, wherein the protocol language comprises a pattern for implementing the network communication procedure; generating a document from the data set, wherein the document includes a text structure, organizing, in light of the protocol language, the text structure into a natural language scheme; and detecting, using the natural language scheme, insights in the document.

METHOD AND APPARATUS FOR DETECTING ANOMALIES IN MISSION CRITICAL ENVIRONMENTS

A method including isolating a protocol language of a data set comprising a text structure representing data regarding a network communication procedure between a plurality of user devices, wherein the protocol language comprises a pattern for implementing the network communication procedure; generating a document from the data set, wherein the document includes a text structure, organizing, in light of the protocol language, the text structure into a natural language scheme; and detecting, using the natural language scheme, insights in the document.