Patent classifications
G06V30/414
SYSTEM AND METHOD FOR FORMAT-AGNOSTIC DOCUMENT INGESTION
A system for format-agnostic document ingestion including a document ingestion server and a database is disclosed. The server is configured to receive an image of a document comprising text in an unknown format, convert the image, using OCR, into a plurality of text elements a content, a size, and an absolute position. The server is also configured to retrieve data detectors from the database, each associated with a data type anticipated to be in the document, and comprising at least one identifier and direction, and at least one validation criteria. The server is also configured to identify a potential descriptor by comparing the content of each text element with the at least one identifier, and then determine if the text element pointed to by the data detector meets the validation criteria. Finally, the server is configured to associate the validated text element with the data detector, and store the content.
SYSTEM AND METHOD FOR FORMAT-AGNOSTIC DOCUMENT INGESTION
A system for format-agnostic document ingestion including a document ingestion server and a database is disclosed. The server is configured to receive an image of a document comprising text in an unknown format, convert the image, using OCR, into a plurality of text elements a content, a size, and an absolute position. The server is also configured to retrieve data detectors from the database, each associated with a data type anticipated to be in the document, and comprising at least one identifier and direction, and at least one validation criteria. The server is also configured to identify a potential descriptor by comparing the content of each text element with the at least one identifier, and then determine if the text element pointed to by the data detector meets the validation criteria. Finally, the server is configured to associate the validated text element with the data detector, and store the content.
MACHINE LEARNING BASED END-TO-END EXTRACTION OF TABLES FROM ELECTRONIC DOCUMENTS
In some embodiments, a method includes identifying a set of word bounding boxes in a first electronic document, and identifying locations of horizontal white space between two adjacent rows from a set of rows in a table. The method includes determining, using a Natural Language Processing algorithm, an entity name from a set of entity names for each table cell from a set of table cells in the table. The method includes determining, using a machine learning algorithm a class from a set of classes for each row from the set of rows. The method includes extracting a set of table cell values associated with the set of table cells, and generating a second electronic document including the set of table cell values arranged in the set of rows and the set of columns such that the set of words in the table are computer-readable in the second electronic document.
MACHINE LEARNING BASED END-TO-END EXTRACTION OF TABLES FROM ELECTRONIC DOCUMENTS
In some embodiments, a method includes identifying a set of word bounding boxes in a first electronic document, and identifying locations of horizontal white space between two adjacent rows from a set of rows in a table. The method includes determining, using a Natural Language Processing algorithm, an entity name from a set of entity names for each table cell from a set of table cells in the table. The method includes determining, using a machine learning algorithm a class from a set of classes for each row from the set of rows. The method includes extracting a set of table cell values associated with the set of table cells, and generating a second electronic document including the set of table cell values arranged in the set of rows and the set of columns such that the set of words in the table are computer-readable in the second electronic document.
METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM FOR DETECTING A CARD SURFACE PICTURE
The invention disclose a method, an apparatus, a device and a storage medium for detecting a card surface picture. The method comprises the following steps of: identifying region information of an image to be shown in a target picture; generating a picture to be detected according to the region information of the image to be shown; synthesizing the picture to be detected and a preset picture to obtain a synthesized picture; and in response to that it is detected that a fourth pixel value is contained in the synthesized picture, determining that the target picture is unqualified, which improves the efficiency of picture review and further improves the efficiency of card fabrication.
METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM FOR DETECTING A CARD SURFACE PICTURE
The invention disclose a method, an apparatus, a device and a storage medium for detecting a card surface picture. The method comprises the following steps of: identifying region information of an image to be shown in a target picture; generating a picture to be detected according to the region information of the image to be shown; synthesizing the picture to be detected and a preset picture to obtain a synthesized picture; and in response to that it is detected that a fourth pixel value is contained in the synthesized picture, determining that the target picture is unqualified, which improves the efficiency of picture review and further improves the efficiency of card fabrication.
TECHNIQUES FOR ENHANCING AN ELECTRONIC DOCUMENT WITH AN INTERACTIVE WORKFLOW
The present patent application describes techniques for generating an enhanced electronic document that may include one or more graphical user interface (GUI) elements that comprise an interactive workflow. An electronic document is automatically processed to identify patterns within the content of the document that indicate individual content items, such as individual steps or instructions associated with a task described in the document, or individual input fields at which information is to be recorded. For each individual content item identified, a data object (e.g., a JSON object) is added to a file, which is ultimately embedded within the original document to create an enhanced electronic document. When the enhanced electronic document is presented via an appropriate document viewing application of a hands-free computing device, the content of the document is presented in combination with the interactive GUI elements so the end-user can interact with the content and GUI elements via audible (spoken) commands.
TECHNIQUES FOR ENHANCING AN ELECTRONIC DOCUMENT WITH AN INTERACTIVE WORKFLOW
The present patent application describes techniques for generating an enhanced electronic document that may include one or more graphical user interface (GUI) elements that comprise an interactive workflow. An electronic document is automatically processed to identify patterns within the content of the document that indicate individual content items, such as individual steps or instructions associated with a task described in the document, or individual input fields at which information is to be recorded. For each individual content item identified, a data object (e.g., a JSON object) is added to a file, which is ultimately embedded within the original document to create an enhanced electronic document. When the enhanced electronic document is presented via an appropriate document viewing application of a hands-free computing device, the content of the document is presented in combination with the interactive GUI elements so the end-user can interact with the content and GUI elements via audible (spoken) commands.
AUTOMATED DOCUMENT PROCESSING FOR DETECTING, EXTRACTNG, AND ANALYZING TABLES AND TABULAR DATA
According to one embodiment, a computer-implemented method for detecting and classifying columns of tables and/or tabular data arrangements within image data includes: detecting one or more tables and/or one or more tabular data arrangements within the image data; extracting the one or more tables and/or the one or more tabular data arrangements from the processed image data; and classifying either: a plurality of columns of the one or more extracted tables; a plurality of columns of the one or more extracted tabular data arrangements; or both the columns of the one or more extracted tables and the columns of the one or more extracted tabular data arrangements. Corresponding systems and computer program products are also disclosed.
AUTOMATED DOCUMENT PROCESSING FOR DETECTING, EXTRACTNG, AND ANALYZING TABLES AND TABULAR DATA
According to one embodiment, a computer-implemented method for detecting and classifying columns of tables and/or tabular data arrangements within image data includes: detecting one or more tables and/or one or more tabular data arrangements within the image data; extracting the one or more tables and/or the one or more tabular data arrangements from the processed image data; and classifying either: a plurality of columns of the one or more extracted tables; a plurality of columns of the one or more extracted tabular data arrangements; or both the columns of the one or more extracted tables and the columns of the one or more extracted tabular data arrangements. Corresponding systems and computer program products are also disclosed.