G06F40/258

Identifying section headings in a document

A method, non-transitory computer readable medium, and system for inferring certain texts as stylized section headings in an electronic document (ED). Stylized section headings are section headings that have unique styling distinct from the body of text below each stylized heading. In particular, the stylized section headings are identified based on styling information in the ED. Identifying stylized section headings includes grouping candidate headings based on identification of dominant styling, locating high level fragments, and repeatedly locating nested fragments from within higher level fragments. The ED may or may not include explicitly identified headings in the document.

Identifying section headings in a document

A method, non-transitory computer readable medium, and system for inferring certain texts as stylized section headings in an electronic document (ED). Stylized section headings are section headings that have unique styling distinct from the body of text below each stylized heading. In particular, the stylized section headings are identified based on styling information in the ED. Identifying stylized section headings includes grouping candidate headings based on identification of dominant styling, locating high level fragments, and repeatedly locating nested fragments from within higher level fragments. The ED may or may not include explicitly identified headings in the document.

MULTI-HOP EVIDENCE PURSUIT
20230035641 · 2023-02-02 ·

A method for neural network training is provided. The method inputs a training set of textual claims, lists of evidence including gold evidence chains, and claim labels labelling the evidence with respect to the textual claims. The claim labels include refutes, supports, and not enough information (NEI). The method computes an initial set of document retrievals for each of the textual claims. The method also includes computing an initial set of page element retrievals including sentence retrievals from the initial set of document retrievals for each of the textual claims. The method creates, from the training set of textual claims, a Leave Out Training Set which includes input texts and target texts relating to the labels. The method trains a sequence-to-sequence neural network to generate new target texts from new input texts using the Leave Out Training Set.

MULTI-HOP EVIDENCE PURSUIT
20230035641 · 2023-02-02 ·

A method for neural network training is provided. The method inputs a training set of textual claims, lists of evidence including gold evidence chains, and claim labels labelling the evidence with respect to the textual claims. The claim labels include refutes, supports, and not enough information (NEI). The method computes an initial set of document retrievals for each of the textual claims. The method also includes computing an initial set of page element retrievals including sentence retrievals from the initial set of document retrievals for each of the textual claims. The method creates, from the training set of textual claims, a Leave Out Training Set which includes input texts and target texts relating to the labels. The method trains a sequence-to-sequence neural network to generate new target texts from new input texts using the Leave Out Training Set.

Method and apparatus for evaluating matching degree based on artificial intelligence, device and storage medium

The present disclosure provides a method and apparatus for evaluating a matching degree based on artificial intelligence, a device and a storage medium, wherein the method comprises: respectively obtaining word expressions of words in a query and word expressions of words in a title; respectively obtaining context-based word expressions of words in the query and context-based word expressions of words in the title according to the word expressions; generating matching features according to obtained information; determining a matching degree score between the query and the title according to the matching features. The solution of the present disclosure may be applied to improve the accuracy of the evaluation result.

Method and apparatus for evaluating matching degree based on artificial intelligence, device and storage medium

The present disclosure provides a method and apparatus for evaluating a matching degree based on artificial intelligence, a device and a storage medium, wherein the method comprises: respectively obtaining word expressions of words in a query and word expressions of words in a title; respectively obtaining context-based word expressions of words in the query and context-based word expressions of words in the title according to the word expressions; generating matching features according to obtained information; determining a matching degree score between the query and the title according to the matching features. The solution of the present disclosure may be applied to improve the accuracy of the evaluation result.

Generating presentation slides with distilled content

A method for generating presentation slides with distilled content including receiving one or more data files as source material for slide generation, obtaining content from the one or more data files for a slide of a slide presentation, identifying a layout template for the slide based on the content, and distilling the content into distilled content to generate a presentation visualization item based on the distilled content. The distilled content may include a subset of the content. The method may also include generating the slide based on the presentation visualization item and the layout template.

Generating presentation slides with distilled content

A method for generating presentation slides with distilled content including receiving one or more data files as source material for slide generation, obtaining content from the one or more data files for a slide of a slide presentation, identifying a layout template for the slide based on the content, and distilling the content into distilled content to generate a presentation visualization item based on the distilled content. The distilled content may include a subset of the content. The method may also include generating the slide based on the presentation visualization item and the layout template.

SEMANTIC MAP GENERATION EMPLOYING LATTICE PATH DECODING
20230075341 · 2023-03-09 ·

Techniques include obtaining a location of a trigger word located in unstructured text of a natural-language-text document; determining, based on the location, a set of words following the trigger word in the unstructured text; conducting a lattice decoding operation for the set of words to determine a clause associated with the trigger word, the operation comprising: determining a clause decoding lattice for the set of words defining one or more paths, between the trigger word and the end of clause token, through the set of words; selecting a path of the clause decoding lattice; and determining, based on the path selected, the clause including one or more words of the set of words that correspond to the path selected; and generating and storing a data model object including the clause associated with the trigger word.

SYSTEMS AND PROCESSES OF EXTRACTING UNSTRUCTURED DATA FROM COMPLEX DOCUMENTS

The present disclosure relates generally to data extraction of complex documents and, more particularly, to systems, processes and computer program products configured to automatically extract unstructured data from complex documents and perform table understanding on the extracted data. For example, the method includes: detecting, by the computer system, one or more tables within a digitized document; classifying, by the computer system, the one or more detected tables into at least a first table type; identifying, by the computer system, headers within the first table type; extracting, by the computer system, data within the headers and body cells of the first table type; and mapping, by the computer system, a relationship between the extracted data within the headers and the body cells.