IPIQ

G06F40/258

METHOD AND SYSTEM FOR GENERATING LARGE CODED DATA SET OF TEXT FROM TEXTUAL DOCUMENTS USING HIGH RESOLUTION LABELING

20170270096 · 2017-09-21 ·

A method and a system for generating coded dataset of sentences with a high resolution labeling are provided herein. The method may include: obtaining a plurality of textual documents that are pre-classified on a whole document level, into topics; training one or more mixed-membership model unsupervised algorithms, implemented by a computer processor, based on said topics, to yield a distribution of sub topics for each of the textual documents; and applying a transformation, implemented by a computer processor, to said distribution of sub topics for each of the textual documents, to yield a topic tagging score for said sub topics on a text-portion level.

Keyphrase extraction beyond language modeling

11250214 · 2022-02-15 ·

Microsoft Technology Licensing, Llc

A system for extracting a key phrase from a document includes a neural key phrase extraction model (“BLING-KPE”) having a first layer to extract a word sequence from the document, a second layer to represent each word in the word sequence by ELMo embedding, position embedding, and visual features, and a third layer to concatenate the ELMo embedding, the position embedding, and the visual features to produce hybrid word embeddings. A convolutional transformer models the hybrid word embeddings to n-gram embeddings, and a feedforward layer converts the n-gram embeddings into a probability distribution over a set of n-grams and calculates a key phrase score of each n-gram. The neural key phrase extraction model is trained on annotated data based on a labeled loss function to compute cross entropy loss of the key phrase score of each n-gram as compared with a label from the annotated dataset.

TEXT PREDICTION METHOD, DEVICE AND STORAGE MEDIUM

20210390255 · 2021-12-16 ·

Beijing Baidu Netcom Science And Technology Co., Ltd.

A text prediction method includes: obtaining at least two sentences by segmenting a text to be predicted; obtaining at least one sentence set by grouping the at least two sentences based on a number of Central Processing Unit (CPU) cores in a target device, in which the target device is a device configured to perform a prediction operation; assigning each sentence set to a corresponding CPU core of the target device, and predicting each sentence set sentence by sentence through the corresponding CPU core to obtain a prediction result of each sentence set; and determining a prediction result of the text to be predicted based on the prediction result of each sentence set.

SYSTEMS AND METHODS FOR STRUCTURE AND HEADER EXTRACTION

20210390298 · 2021-12-16 ·

Thomson Reuters Enterprise Centre GmbH

Richard Anthony Pito

The present disclosure is directed towards systems and methods for extracting structure and headers from a body of text. This computational extraction is based on the visual and logical similarities between portions of text. Boilerplate is removed from chunks of text making up potential headers and the cleaned result is compared against other potential headers and the remainder of the body of text.

Method and system for generating parsed document from digital document

11200412 · 2021-12-14 ·

Innoplexus AG

A method and system for generating a parsed document from a digital document. The method includes segmenting the digital document into at least one section; classifying the at least one section of the digital document into at least one of a class: text class, table class, figure class, noise class; identifying a reading order of the digital document; and processing each of the at least one section of the digital document. Furthermore, processing each of the at least one section of the digital document comprises extracting content from each of the at least one section based on the class; and structuring the extracted content based on the reading order to generate the parsed document.

Method and system for generating parsed document from digital document

11200412 · 2021-12-14 ·

Innoplexus AG

SUMMARIZING MULTI-MODAL CONVERSATIONS IN A MULTI-USER MESSAGING APPLICATION

20210383799 · 2021-12-09 ·

International Business Machines Corporation

An embodiment of a summarization application divides collected conversation data into media and text components. The application implements respective machine learning mechanisms to enhance modeling operations of the text and media components to identify key elements from the conversation. The application generates a headline banner from a group of key elements based on an analysis involving first predetermined criteria. The application also combines additional key elements to the group of key elements to form a second group of key elements. The application generates a summary from the second group of key elements based on a second analysis involving second predetermined criteria. The application presents, via a display, the headline banner according to a first output of the first key element analysis and the summary according to a second output of the second key element analysis.

SYSTEM AND METHOD FOR VALIDATING TABULAR SUMMARY REPORTS

20220198133 · 2022-06-23 ·

A unit and system to validate and correct clinical summary reports includes a parser to parse a clinical summary report and at least one associated source report at document, table and cell level and to generate a table object for each table in each report together with a matrix of cells for each table, each matrix cell represented as a table cell object having an index; an identifier to identify each individual table cell object, a fingerprint generator to determine a semantic fingerprint for each cell object, including a value, header text and row and column information; a cell matcher to match clinical study report table cell objects with corresponding linked source report table cell objects using the fingerprint; and a validator to compare values between the matched clinical study report table cell objects and the at least one associated table cell source report objects to provide a validation decision.

SYSTEM AND METHOD FOR VALIDATING TABULAR SUMMARY REPORTS

20220198133 · 2022-06-23 ·

METHOD AND DEVICE TO AUTOMATICALLY IDENTIFY THEMES AND BASED THEREON DERIVE PATH DESIGNATOR PROXY INDICIA

20220197866 · 2022-06-23 ·

Methods, devices and computer program products are provided that, under control of one or more processors, perform resource theme identification (RTI) automatically by; accessing an active resource that includes a path designator (PD) element that includes at least a portion of a path designator for a resource; analyzing the active resource to identify a text element, an audio element and/or an image element; analyzing the text/image element utilizing an RTI algorithm, that applies at least one of natural language understanding (NLU) or image recognition (IR), to identify the one or more themes; deriving proxy indicia based on the theme(s); substituting, into the active resource, the proxy indicia for the path designator to present the proxy indicia in place of the path designator, the proxy indicia linked to the path designator; and displaying the active resource including the proxy indicia.

Patent classifications

G06F40/258