G06F40/289

Method and device for keyword extraction and storage medium

A method and device for keyword extraction and a storage medium. The method includes receiving, at a terminal, an original document, acquiring, at the terminal, a candidate set by extracting at least one candidate phrase from the original document, acquiring, at the terminal, an association degree between the at least one candidate phrase in the candidate set and the original document, acquiring, at the terminal, a divergence degree of the at least one candidate phrase in the candidate set, and updating, at the terminal, a key phrase set of the original document by selecting the at least one candidate phrase from the candidate set as at least one key phrase based on the association degree and the divergence degree.

Word attribution prediction from subject data

A digital attribution system is described to generate predictions of word attributions from subject data, e.g., titles, subject lines of emails, and so on. To do so, an attribution score is first generated by the digital attribution system that describe an amount to which respective words in the subject data cause performance of a corresponding outcome. The attribution scores are then used by the digital attribution system to generate representations for display in a user interface for respective words in the subject data and may also be used to generate attribution recommendations of changes to be made to the subject data.

Word attribution prediction from subject data

A digital attribution system is described to generate predictions of word attributions from subject data, e.g., titles, subject lines of emails, and so on. To do so, an attribution score is first generated by the digital attribution system that describe an amount to which respective words in the subject data cause performance of a corresponding outcome. The attribution scores are then used by the digital attribution system to generate representations for display in a user interface for respective words in the subject data and may also be used to generate attribution recommendations of changes to be made to the subject data.

Corpus cleaning method and corpus entry system
11580299 · 2023-02-14 · ·

The present disclosure provides a corpus cleaning method and a corpus entry system. The method includes: obtaining an input utterance; generating a predicted value of an information amount of each word in the input utterance according to the context of the input utterance using a pre-trained general model; and determining redundant words according to the predicted value of the information amount of each word, and determining whether to remove the redundant words from the input utterance. In such a manner, the objectivity and accuracy of corpus cleaning can be improved.

Corpus cleaning method and corpus entry system
11580299 · 2023-02-14 · ·

The present disclosure provides a corpus cleaning method and a corpus entry system. The method includes: obtaining an input utterance; generating a predicted value of an information amount of each word in the input utterance according to the context of the input utterance using a pre-trained general model; and determining redundant words according to the predicted value of the information amount of each word, and determining whether to remove the redundant words from the input utterance. In such a manner, the objectivity and accuracy of corpus cleaning can be improved.

METHOD FOR PRE-TRAINING MODEL, DEVICE, AND STORAGE MEDIUM
20230040095 · 2023-02-09 ·

A method and apparatus for pre-training a model, a device, a storage medium, and a program product. An embodiment of the method includes: acquiring a sample natural language text; generating N types of prompt words based on the sample natural language text, where N is a positive integer; generating sample input data based on the sample natural language text and the N types of prompt words; and training an initial language model based on the sample input data, to obtain a pre-trained language model.

METHOD FOR PRE-TRAINING MODEL, DEVICE, AND STORAGE MEDIUM
20230040095 · 2023-02-09 ·

A method and apparatus for pre-training a model, a device, a storage medium, and a program product. An embodiment of the method includes: acquiring a sample natural language text; generating N types of prompt words based on the sample natural language text, where N is a positive integer; generating sample input data based on the sample natural language text and the N types of prompt words; and training an initial language model based on the sample input data, to obtain a pre-trained language model.

LEARNING DATA GENERATION DEVICE, METHOD, AND RECORD MEDIUM FOR STORING PROGRAM

A learning data generation device includes processing circuitry to extract a cause expression and a result expression from an input text, and to generate a modified text by at least one of a method of interchanging the cause expression and the result expression and a method of specifying one of the cause expression and the result expression as a modification target sentence and replacing the modification target sentence with a replacement candidate sentence dissimilar to the modification target sentence.

METHOD OF DETECTING, SEGMENTING AND EXTRACTING SALIENT REGIONS IN DOCUMENTS USING ATTENTION TRACKING SENSORS

A method and system for detecting, segmenting, and extracting salient regions in documents by using attention tracking sensors is provided. The method includes: receiving an image that corresponds to a document; receiving, from a sensor, a sequence of measurements that correspond to a human reading of the document; determining, based on the sequence of measurements, at least one region of the document as being a salient document region; demarcating the salient document region in an electronically displayable manner; and outputting a file that includes a displayable version of the document with the demarcated document region. The salient document region may include a title, a section header, and/or a table. The sensor may be an eye-tracking sensor that detects a sequence of eye-gaze positions on the document as a function of time.

METHOD OF DETECTING, SEGMENTING AND EXTRACTING SALIENT REGIONS IN DOCUMENTS USING ATTENTION TRACKING SENSORS

A method and system for detecting, segmenting, and extracting salient regions in documents by using attention tracking sensors is provided. The method includes: receiving an image that corresponds to a document; receiving, from a sensor, a sequence of measurements that correspond to a human reading of the document; determining, based on the sequence of measurements, at least one region of the document as being a salient document region; demarcating the salient document region in an electronically displayable manner; and outputting a file that includes a displayable version of the document with the demarcated document region. The salient document region may include a title, a section header, and/or a table. The sensor may be an eye-tracking sensor that detects a sequence of eye-gaze positions on the document as a function of time.