Patent classifications
G06F40/258
Extract Data From A True PDF Page
The system may perform a method comprising analyzing metadata of a text layer of a page of a first pdf document to determine that the pdf document is a first true pdf document; receiving the first true pdf document, in response to the first pdf document being the first true pdf document; receiving a selection of a field including first data to be extracted from the first true pdf document; displaying the first data; creating a template including the coordinates corresponding to the selected field and the first data of the first true pdf document; and extracting from an accessible text layer of a second true pdf document, second data based on the template from the first true pdf document.
Enhance a mail application to format a long email conversation for easy consumption
System and methods discussed for automatically generating conversation-based reports from email threads for easier and more intuitive user-consumption may include a parser, configured to identify all related emails, extract relevant portions of each email including embedded or in-line comments within quoted portions, and generate a single report document that presents the conversation in chronological order. Duplicate portions of each email are automatically removed and excluded from the report, reducing memory and bandwidth requirements, and also making the report more intuitive and easier to read. Attachments to the email may be included in the report, with additional deduplication to further reduce memory and bandwidth requirements.
Enhance a mail application to format a long email conversation for easy consumption
System and methods discussed for automatically generating conversation-based reports from email threads for easier and more intuitive user-consumption may include a parser, configured to identify all related emails, extract relevant portions of each email including embedded or in-line comments within quoted portions, and generate a single report document that presents the conversation in chronological order. Duplicate portions of each email are automatically removed and excluded from the report, reducing memory and bandwidth requirements, and also making the report more intuitive and easier to read. Attachments to the email may be included in the report, with additional deduplication to further reduce memory and bandwidth requirements.
Identifying sequence headings in a document
A method for processing an electronic document (ED) to infer a sequence of section headings in the ED. The method includes generating, by a computer processor, based on regular expression matching of a predetermined section heading pattern and a plurality of characters in the ED, a list of candidate headings in the ED; generating, by the computer processor and based on the list of candidate headings, a list of chain fragments for inferring a portion of the sequence of section headings; and generating, by the computer processor and based on predetermined criteria, the sequence of section headings by merging at least two chain fragments in the list of chain fragments.
Identifying sequence headings in a document
A method for processing an electronic document (ED) to infer a sequence of section headings in the ED. The method includes generating, by a computer processor, based on regular expression matching of a predetermined section heading pattern and a plurality of characters in the ED, a list of candidate headings in the ED; generating, by the computer processor and based on the list of candidate headings, a list of chain fragments for inferring a portion of the sequence of section headings; and generating, by the computer processor and based on predetermined criteria, the sequence of section headings by merging at least two chain fragments in the list of chain fragments.
Systems and methods for generating dynamic annotations
A system for managing media content annotations is configured to generate annotations having a format similar to a title and tailored to a user profile. The system identifies a media content item and identifies a user entity. The system selects from among a plurality of annotations linked to the media content item and stored in metadata. For example, the system may generate more than one annotation, generate links between each annotation and user profile information, and then select among the annotations for the most appropriate annotation for a given user. The annotation may include keywords or entities that are included in, linked to, or otherwise associated with the user profile information. The system outputs, or generates for output, a display that includes a representation of the media content item and the selected annotation.
System and method for identification and profiling adverse events
With the proliferation of data and documents available on the internet and other information sources, analysis of adverse events poses a serious technical challenge on account of associated data volume and variety. This disclosure relates generally to identification and profiling of adverse events. By receiving a set of articles from a plurality of data sources and utilizing a series of Natural Language Processors, NLP techniques are employed to identify implicit and explicit adverse events. Entity statistics and sentiment extraction and analysis is performed. An ontology based adverse event identification framework is proposed for identification and profiling of implicit adverse event. An attention based bi-directional long short term memory network for adverse event identification and classification is proposed.
System and method for identification and profiling adverse events
With the proliferation of data and documents available on the internet and other information sources, analysis of adverse events poses a serious technical challenge on account of associated data volume and variety. This disclosure relates generally to identification and profiling of adverse events. By receiving a set of articles from a plurality of data sources and utilizing a series of Natural Language Processors, NLP techniques are employed to identify implicit and explicit adverse events. Entity statistics and sentiment extraction and analysis is performed. An ontology based adverse event identification framework is proposed for identification and profiling of implicit adverse event. An attention based bi-directional long short term memory network for adverse event identification and classification is proposed.
DEVICE AND METHOD FOR AUTOMATICALLY GENERATING DOMAIN-SPECIFIC IMAGE CAPTION BY USING SEMANTIC ONTOLOGY
An apparatus for automatically generating a domain-specific image caption using a semantic ontology is provided. The apparatus includes a caption generator configured to generate an image caption in the form of a sentence describing an image provided from a client, in which the client includes a user device, and the caption generator includes a server connected to the user device through a wired/wireless communication method.
UTILIZING VISUAL AND TEXTUAL ASPECTS OF IMAGES WITH RECOMMENDATION SYSTEMS
Described herein are systems and methods for generating an embedding—a learned representation—for an image. The embedding for the image is derived to capture visual aspects, as well as textual aspects, of the image. An encoder-decoder is trained to generate the visual representation of the image. An optical character recognition (OCR) algorithm is used to identify text/words in the image. From these words, an embedding is derived by performing an average pooling operation on pre-trained embeddings that map to the identified words. Finally, the embedding representing the visual aspects of the image is combined with the embedding representing the textual aspects of the image to generate a final embedding for the image.