Patent classifications
G06F40/258
SYSTEMS AND PROCESSES OF EXTRACTING UNSTRUCTURED DATA FROM COMPLEX DOCUMENTS
The present disclosure relates generally to data extraction of complex documents and, more particularly, to systems, processes and computer program products configured to automatically extract unstructured data from complex documents and perform table understanding on the extracted data. For example, the method includes: detecting, by the computer system, one or more tables within a digitized document; classifying, by the computer system, the one or more detected tables into at least a first table type; identifying, by the computer system, headers within the first table type; extracting, by the computer system, data within the headers and body cells of the first table type; and mapping, by the computer system, a relationship between the extracted data within the headers and the body cells.
Keyword extraction method, apparatus and medium
A keyword extraction method includes: extracting candidate words from an original document to form a first word set; acquiring a first association degree between each first word thereof and the original document, and determining a second word set according to the first association degree; for each second word in the second word set, inquiring, in a word association topology, at least one node word satisfying a condition of association with the second word and forming a third word set, the word association topology indicating an association relation among multiple node words in a predetermined field; and determining a union set of the second and third word sets, acquiring a second association degree between each candidate keyword in the union set and the original document, and selecting, according to the second association degree, at least one candidate keyword from the union set, to form a keyword set of the original document.
Method and device to automatically identify themes and based thereon derive path designator proxy indicia
Methods, devices and computer program products are provided that, under control of one or more processors, perform resource theme identification (RTI) automatically by; accessing an active resource that includes a path designator (PD) element that includes at least a portion of a path designator for a resource; analyzing the active resource to identify a text element, an audio element and/or an image element; analyzing the text/image element utilizing an RTI algorithm, that applies at least one of natural language understanding (NLU) or image recognition (IR), to identify the one or more themes; deriving proxy indicia based on the theme(s); substituting, into the active resource, the proxy indicia for the path designator to present the proxy indicia in place of the path designator, the proxy indicia linked to the path designator; and displaying the active resource including the proxy indicia.
Method and device to automatically identify themes and based thereon derive path designator proxy indicia
Methods, devices and computer program products are provided that, under control of one or more processors, perform resource theme identification (RTI) automatically by; accessing an active resource that includes a path designator (PD) element that includes at least a portion of a path designator for a resource; analyzing the active resource to identify a text element, an audio element and/or an image element; analyzing the text/image element utilizing an RTI algorithm, that applies at least one of natural language understanding (NLU) or image recognition (IR), to identify the one or more themes; deriving proxy indicia based on the theme(s); substituting, into the active resource, the proxy indicia for the path designator to present the proxy indicia in place of the path designator, the proxy indicia linked to the path designator; and displaying the active resource including the proxy indicia.
User-specific text record-based format prediction
A method identifies a text region in an electronic document. The method determines that the text region includes a candidate text portion that is a candidate for applying a formatting suggestion based on a comparison of the text region with predetermined patterns. The method identifies a stored text record that corresponds to the candidate text portion. The method confirms whether the formatting type is appropriate for the candidate text portion based on individual word matches between the candidate text portion and the stored text record. The method notifies a user of the electronic document of the formatting suggestion according to the formatting type.
User-specific text record-based format prediction
A method identifies a text region in an electronic document. The method determines that the text region includes a candidate text portion that is a candidate for applying a formatting suggestion based on a comparison of the text region with predetermined patterns. The method identifies a stored text record that corresponds to the candidate text portion. The method confirms whether the formatting type is appropriate for the candidate text portion based on individual word matches between the candidate text portion and the stored text record. The method notifies a user of the electronic document of the formatting suggestion according to the formatting type.
Selectively targeting content section for cognitive analytics and search
A computer system includes a natural language processing (NLP) unit, a storage unit, a user interface and a search engine. The NLP unit analyzes a content source to identify one or more sections containing searchable content and generate section metadata respective to each identified section included in the content source. The storage unit stores the section metadata and the user interface receives a section-scoped query aimed at searching an identified section corresponding to the at least one first section metadata stored in the storage unit without searching an identified section corresponding to at least one second section metadata stored in the storage unit. Based on the section-scoped query, the search engine analyzes the at least one first section metadata stored in the storage unit without analyzing the at least one second section metadata.
Systems and methods for automatic and adaptive browser bookmarks
A dynamic browser bookmark system configured to present a user's bookmarks in a hierarchical manner based on webpage contextual data is disclosed. The system may aggregate browser bookmarks associated with a user device. The system may determine a first webpage being viewed on the user device. The system may dynamically filter the browser bookmarks to determine a first plurality of browser bookmarks associated with the first webpage. The system may generate and transmit a graphical user interface including a hierarchical view of the browser bookmarks. The system may determine a second webpage being viewed on the user device and dynamically filter the aggregated browser bookmarks to determine a second plurality of browser bookmarks relevant to the second webpage, which may include a browser bookmark of the first plurality of browser bookmarks, and generate a second graphical user interface including a hierarchical view of the second plurality of browser bookmarks.
Systems and methods for automatic and adaptive browser bookmarks
A dynamic browser bookmark system configured to present a user's bookmarks in a hierarchical manner based on webpage contextual data is disclosed. The system may aggregate browser bookmarks associated with a user device. The system may determine a first webpage being viewed on the user device. The system may dynamically filter the browser bookmarks to determine a first plurality of browser bookmarks associated with the first webpage. The system may generate and transmit a graphical user interface including a hierarchical view of the browser bookmarks. The system may determine a second webpage being viewed on the user device and dynamically filter the aggregated browser bookmarks to determine a second plurality of browser bookmarks relevant to the second webpage, which may include a browser bookmark of the first plurality of browser bookmarks, and generate a second graphical user interface including a hierarchical view of the second plurality of browser bookmarks.
Extract Data From A True PDF Page
The system may perform a method comprising analyzing metadata of a text layer of a page of a first pdf document to determine that the pdf document is a first true pdf document; receiving the first true pdf document, in response to the first pdf document being the first true pdf document; receiving a selection of a field including first data to be extracted from the first true pdf document; displaying the first data; creating a template including the coordinates corresponding to the selected field and the first data of the first true pdf document; and extracting from an accessible text layer of a second true pdf document, second data based on the template from the first true pdf document.