Patent classifications
G06F40/117
Detecting truncation and overlap defects on webpage
A computer-implemented method, system and computer program product for detecting truncation and overlap defects. Location and size information for the elements of the webpage are obtained. An intersection over union (IoU) calculation is performed for two webpage elements using the obtained location and size information for at least one of these webpage elements. Furthermore, the location relationship between these two webpage elements is determined. A table, which defines truncation defect and overlap defect scenarios, is then reviewed to determine if there are any truncation or overlap defects in these two webpage elements using the IoU calculation, the location relationship and the text condition, which indicates whether text is included in one of the two webpage elements. If any truncation or overlap defects are found in the webpage, then such truncation and/or overlap defects are marked on a screen capture of the webpage.
CONCISENESS RECONSTRUCTION OF A CONTENT PRESENTATION VIA NATURAL LANGUAGE PROCESSING
A method may include obtaining a document and using a first prediction model to generate text block scores for text blocks in the document, where a first text block of the text blocks is associated with a first text block score of the plurality of text block scores. The method also includes updating, in response to the first text block score for the first text block failing to satisfy a criterion, a modified version of the document with an indicator to set the first text block as a hidden text block in a presentation of the modified version. The method also includes generating a summarization of the first text block based on the words in the first text block and updating the modified version of the document to include the summarization. The method also includes providing the modified version of the document to a user device.
CONCISENESS RECONSTRUCTION OF A CONTENT PRESENTATION VIA NATURAL LANGUAGE PROCESSING
A method may include obtaining a document and using a first prediction model to generate text block scores for text blocks in the document, where a first text block of the text blocks is associated with a first text block score of the plurality of text block scores. The method also includes updating, in response to the first text block score for the first text block failing to satisfy a criterion, a modified version of the document with an indicator to set the first text block as a hidden text block in a presentation of the modified version. The method also includes generating a summarization of the first text block based on the words in the first text block and updating the modified version of the document to include the summarization. The method also includes providing the modified version of the document to a user device.
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND RECORDING MEDIUM
Provided is an information processing device including an input detector that detects a designated position of a display, a tag generator that generates a tag image by associating the position detected by the input detector with input information to be input and displays the generated tag image on the display, and a display processor that displays, on the display, a plurality of pieces of the input information associated with each of a plurality of the tag images generated by the tag generator, based on each of attributes of the plurality of tag images.
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND RECORDING MEDIUM
Provided is an information processing device including an input detector that detects a designated position of a display, a tag generator that generates a tag image by associating the position detected by the input detector with input information to be input and displays the generated tag image on the display, and a display processor that displays, on the display, a plurality of pieces of the input information associated with each of a plurality of the tag images generated by the tag generator, based on each of attributes of the plurality of tag images.
SYSTEM AND METHOD FOR AUTOMATICALLY TAGGING DOCUMENTS
System and methods (100) for automatically tagging electronic documents are disclosed. An input module receives (102) an electronic document to be tagged. A preprocessing module then preprocesses (104) the electronic document to be tagged. The preprocessing of the electronic document comprises extracting a text from the electronic document to be tagged, replacing a number or a date in the extracted text with a predetermined symbol, and tokenizing the extracted text with the predetermined symbol into a plurality of tokens. After the preprocessing (104), a deep learning module determines (106) a tag for at least one of the plurality of tokens. The determined tag for the at least one token is then output (108) by an output module.
SYSTEM AND METHOD FOR AUTOMATICALLY TAGGING DOCUMENTS
System and methods (100) for automatically tagging electronic documents are disclosed. An input module receives (102) an electronic document to be tagged. A preprocessing module then preprocesses (104) the electronic document to be tagged. The preprocessing of the electronic document comprises extracting a text from the electronic document to be tagged, replacing a number or a date in the extracted text with a predetermined symbol, and tokenizing the extracted text with the predetermined symbol into a plurality of tokens. After the preprocessing (104), a deep learning module determines (106) a tag for at least one of the plurality of tokens. The determined tag for the at least one token is then output (108) by an output module.
SYSTEM AND METHOD FOR STATISTICAL SUBJECT IDENTIFICATION FROM INPUT DATA
Embodiments provide a system and method for statistical subject identification. The system takes texts, videos, audios, and images as input for which subject needs to be identified. The system pre-process input data and generates n-grams and pre-processed text strings by removing stopwords, punctuations, selective POS tags and lemmatization. Frequency distribution of n-grams are computed, and weightage of n-grams is assigned. For each n-gram, sum of weights across all text strings is computed and a maximum weightage is identified. The computed value as a result of taking a ratio of two, is assigned to each of the n-grams. Values computed for the n-grams have a non-normal distribution, when observed statistically. Thus, the n-gram values are transformed to confidence value following a normal distribution. The system maps the n-gram domains using a domain lexicon. Finally, these domains are aggregated and converged for subject identification based on a pre-annotated mapping dictionary.
SYSTEM AND METHOD FOR STATISTICAL SUBJECT IDENTIFICATION FROM INPUT DATA
Embodiments provide a system and method for statistical subject identification. The system takes texts, videos, audios, and images as input for which subject needs to be identified. The system pre-process input data and generates n-grams and pre-processed text strings by removing stopwords, punctuations, selective POS tags and lemmatization. Frequency distribution of n-grams are computed, and weightage of n-grams is assigned. For each n-gram, sum of weights across all text strings is computed and a maximum weightage is identified. The computed value as a result of taking a ratio of two, is assigned to each of the n-grams. Values computed for the n-grams have a non-normal distribution, when observed statistically. Thus, the n-gram values are transformed to confidence value following a normal distribution. The system maps the n-gram domains using a domain lexicon. Finally, these domains are aggregated and converged for subject identification based on a pre-annotated mapping dictionary.
EXTRACTION OF TASKS FROM DOCUMENTS USING WEAKLY SUPERVISION
This disclosure relates to extraction of tasks from documents based on a weakly supervised classification technique, wherein extraction of tasks is identification of mentions of tasks in a document. There are several prior arts addressing the problem of extraction of events, however due to crucial distinctions between events-tasks, task extraction stands as a separate problem. The disclosure explicitly defines specific characteristics of tasks, creates labelled data at a word-level based on a plurality of linguistic rules to train a word-level weakly supervised model for task extraction. The labelled data is created based on the plurality of linguistic rules for a non-negation aspect, a volitionality aspect, an expertise aspect and a plurality of generic aspects. Further the disclosure also includes a phrase expansion technique to capture the complete meaning expressed by the task instead of merely mentioning the task that may not capture the entire meaning of the sentence.