Summary evaluation device, method, program, and storage medium

Abstract

The present disclosure relates to a method of evaluating accuracy of a summary of a document. The method includes receiving a plurality of reference summaries of a document and a system summary of the document. The system summary is generated by a machine. The method further includes extracting, for each reference summary, a tuple that is a pair of words composed of a modified word and a dependent word having a dependency relation to the modified word and a label representing the dependency relation. The method further includes replacing, for each of the extracted tuples, each of the modified word of the tuple's word pair and the dependent word with a class predetermined for the words. The method further generates a score of the system summary based on the class and a set of tuples of the system summary.

Claims

1. A computer-implemented method for evaluating aspects of a document, the method comprising: receiving a plurality of reference summaries of the document; receiving a system summary of the document, wherein the system summary is a machine-generated summary of the document; generating at least a first set of tuples for one of the plurality of reference summaries and at least a second set of tuples for the system summary, wherein each tuple comprises: a head word, a modifier word having a dependency relation with the head word, and a label indicating the dependency relation based on one or more reference summaries of the plurality of reference summaries and the system summary; for each of one or more tuples of at least the first and the second sets of tuples, replacing the head word with a first class of words and the modifier word with a second class of words, wherein the head word and the first class of words are substantially similar in multi-dimensional vector forms; determining a score of the system summary for evaluating the system summary of the document based at least on a common set of tuples between the first set of tuples of the plurality of reference summaries with the replaced first class of words and the second set of tuples of the system summary with the replaced second class of words; and providing the score.

2. The computer-implemented method of claim 1, the method further comprising: receiving a plurality of semantic vectors of words; and determining a plurality of classes based clustering the plurality of semantic vectors, the plurality of classes including the first class and the second class.

3. The computer-implemented method of claim 1, the method further comprising: determining the score based on a degree of overlap between sets of tuples for the plurality of reference summaries and the second set of tuples.

4. The computer-implemented method of claim 1, wherein each class represents a plurality of words corresponding to a clustered set of semantic vectors based on a cosine similarity among the semantic vectors.

5. The computer-implemented method of claim 1, the method further comprising: extracting one or more sentences from the one or more of the plurality of reference summaries and the system summary; generating a plurality of classes based on clustering a plurality of words in the extracted one or more sentences; and extracting a plurality of pairs of words from the one or more sentences for generating tuples.

6. The computer-implemented method of claim 1, wherein the determining the score is independent of a frequency of tuples appearing at least in the first set of tuples and the second set of tuples.

7. The computer-implemented method of claim 1, wherein the generating the at least a first set of tuples for one of the plurality of reference summaries is based on a first dependency structure analyses of a first set of words in the plurality of reference summaries.

8. A system for evaluating aspects of a document, the system comprises: a processor; and a memory storing computer-executable instructions that when executed by the processor cause the system to: receive a plurality of reference summaries of the document; receive a system summary of the document, wherein the system summary is a machine-generated summary of the document; generate at least a first set of tuples for one of the plurality of reference summaries and at least a second set of tuples for the system summary, wherein each tuple comprises: a head word, a modifier word having a dependency relation with the head word, and a label indicating the dependency relation based on one or more reference summaries of the plurality of reference summaries and the system summary; for each of one or more tuples of at least the first and the second sets of tuples, replace the head word with a first class of words and the modifier word with a second class of words wherein the head word and the first class of words are substantially similar in multi-dimensional vector forms; determine a score of the system summary for evaluating the system summary of the document based at least on a common set of tuples between the first set of tuples of the plurality of reference summaries with the replaced first class of words and the second set of tuples of the system summary with the replaced second class of words; and provide the score.

9. The system of claim 8, the computer-executable instructions when executed further causing the system to: receive a plurality of semantic vectors of words; and determine a plurality of classes based clustering the plurality of semantic vectors, the plurality of classes including the first class and the second class.

10. The system of claim 8, the computer-executable instructions when executed further causing the system to: determine the score based on a degree of overlap between sets of tuples for the plurality of reference summaries and the second set of tuples.

11. The system of claim 8, wherein each class represents a plurality of words corresponding to a clustered set of semantic vectors based on a cosine similarity among the semantic vectors.

12. The system of claim 8, the computer-executable instructions when executed further causing the system to: extract one or more sentences from the one or more of the plurality of reference summaries and the system summary; generate a plurality of classes based on clustering a plurality of words in the extracted one or more sentences; and extract a plurality of pairs of words from the one or more sentences for generating tuples.

13. The system of claim 8, wherein the determining the score is independent of a frequency of tuples appearing at least in the first set of tuples and the second set of tuples.

14. The system of claim 8, wherein the generating the at least a first set of tuples for one of the plurality of reference summaries is based on a first dependency structure analyses of a first set of words in the plurality of reference summaries.

15. A computer-readable non-transitory recording medium storing computer-executable instructions that when executed by a processor cause a computer system to: receive a plurality of reference summaries of the document; receive a system summary of the document, wherein the system summary is a machine-generated summary of the document; generate at least a first set of tuples for one of the plurality of reference summaries and at least a second set of tuples for the system summary, wherein each tuple comprises: a head word, a modifier word having a dependency relation with the head word, and a label indicating the dependency relation based on one or more reference summaries of the plurality of reference summaries and the system summary; for each of one or more tuples of at least the first and the second sets of tuples, replace the head word with a first class of words and the modifier word with a second class of words wherein the head word and the first class of words are substantially similar in multi-dimensional vector forms; determine a score of the system summary for evaluating the system summary of the document based at least on a common set of tuples between the first set of tuples of the plurality of reference summaries with the replaced first class of words and the second set of tuples of the system summary with the replaced second class of words; and provide the score.

16. The computer-readable non-transitory recording medium of claim 15, the computer-executable instructions when executed further causing the system to: receive a plurality of semantic vectors of words; and determine a plurality of classes based clustering the plurality of semantic vectors, the plurality of classes including the first class and the second class.

17. The computer-readable non-transitory recording medium of claim 15, the computer-executable instructions when executed further causing the system to: determine the score based on a degree of overlap between sets of tuples for the plurality of reference summaries and the second set of tuples.

18. The computer-readable non-transitory recording medium of claim 15, wherein each class represents a plurality of words corresponding to a clustered set of semantic vectors based on a cosine similarity among the semantic vectors.

19. The computer-readable non-transitory recording medium of claim 15, the computer-executable instructions when executed further causing the system to: extract one or more sentences from the one or more of the plurality of reference summaries and the system summary; generate a plurality of classes based on clustering a plurality of words in the extracted one or more sentences; and extract a plurality of pairs of words from the one or more sentences for generating tuples.

20. The computer-readable non-transitory recording medium of claim 15, wherein the determining the score is independent of a frequency of tuples appearing at least in the first set of tuples and the second set of tuples.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) FIG. 1 is a block diagram illustrating a configuration of a summary evaluation device according to an embodiment of the present invention.

(2) FIG. 2 is a diagram illustrating an example of performing dependency structure analysis to extract tuples.

(3) FIG. 3 is a diagram illustrating an example of replacing words in a tuple with class IDs.

(4) FIG. 4 is a flowchart illustrating a summary evaluation process routine in a summary evaluation device according to an embodiment of the present invention.

DESCRIPTION OF EMBODIMENTS

(5) Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

Overview of Embodiment of Present Invention

(6) In the embodiment of the present invention, the above-mentioned two problems are solved according to a method of (1) not taking the frequency of a tuple into consideration during calculation of scores and (2) but taking a semantic class of a word into consideration in matching of tuples. Specifically, a system summary is evaluated by Equation (2) below.

(7) $\begin{matrix} [Formula 2] \\ mBE (R, S) = \frac{.Math. 𝒯 .Math. T_{S} .Math.}{.Math. 𝒯 .Math.} & (2) \end{matrix}$

(8) T.sub.s is a set of tuples obtained from a system summary.

(9) It is assumed that words of tuples included in
T, T.sub.s
are replaced with class IDs of classes corresponding to the words. Conversion from words to class IDs may be performed by clustering words using a K-means method, a hierarchical clustering method, or the like on the basis of a word vector and determining the class ID of a word according to a cluster ID.

Configuration of Summary Evaluation Device According to Embodiment of Present Invention

(10) Next, a configuration of a summary evaluation device according to the embodiment of the present invention will be described. As illustrated in FIG. 1, a summary evaluation device 100 according to an embodiment of the present invention can be configured as a computer including a CPU, a RAM, and a ROM storing a program for executing a summary evaluation process routine to be described later and various pieces of data. The summary evaluation device 100 functionally includes an input unit 10, an arithmetic unit 20, and an output unit 50 as illustrated in FIG. 1.

(11) The input unit 10 receives K reference summaries obtained in advance for a summary target document and a system summary generated for the summary target document by a system.

(12) The arithmetic unit 20 includes a sentence breaking unit 30, a word clustering unit 32, a tuple extraction unit 34, and a score calculation unit 36.

(13) The sentence breaking unit 30 breaks the K reference summaries and the system summary received by the input unit 10 into sentences. Sentence breaking may be performed using an existing sentence breaking tool, and breaking rules may be created on the basis of information such as punctuation marks to implement a breaker.

(14) The word clustering unit 32 clusters words included in the K reference summaries and the system summary broken by the sentence breaking unit 30 using semantic vectors of words. Word clustering can be realized by expressing words as n-dimensional vectors and clustering the same on the basis of a cosine similarity between the vectors using a K-means method, a hierarchical clustering method, or the like. A tool such as word2vec may be used for expressing words as n-dimensional vectors.

(15) The tuple extraction unit 34 extracts tuples which are sets of a word pair composed of a head word and a modifier word having a dependency relation and a label indicating the dependency relation for each of the K reference summaries and the system summary broken by the sentence breaking unit 30. For example, tuples are extracted by performing such dependency structure analysis as illustrated in FIG. 2. Subsequently, the tuple extraction unit 34 replaces each of the head word and the modifier word of the word pair of each of the extracted tuples with a class in the word clustering results of the word clustering unit 32. For example, as illustrated in FIG. 3, a word in a tuple is replaced with an index of a cluster, which is used as a class ID of a cluster to which the word belongs.

(16) The score calculation unit 36 calculates a score corresponding to the degree of overlap between a group of tuples in all K reference summaries and a group of tuples of the system summary, replaced with the classes by the tuple extraction unit 34 according to Equation (3) below and outputs the calculated score to the output unit 50.

(17) $\begin{matrix} [Formula 3] \\ mBE (R, S) = \frac{.Math. 𝒯 .Math. T_{S} .Math.}{.Math. 𝒯 .Math.} & (3) \end{matrix}$

(18) As described above, since the reference summary and the system summary are grasped as a group of tuples obtained from a dependency structure and a score calculation formula that does not take the frequency of each tuple in an original summary into consideration, it is possible to prevent a situation in which a partial word can get a higher score. Moreover, since words constituting a tuple are replaced with class IDs of a word cluster, tuples having similar meanings can be regarded as being identical tuples. In this way, it is possible to evaluate a summary by taking a semantic class of words into consideration.

Operation of Summary Evaluation Device According to Embodiment of Present Invention

(19) Next, an operation of the summary evaluation device 100 according to the embodiment of the present invention will be described. When the input unit 10 receives K reference summaries obtained in advance for a summary target document and a system summary generated for the summary target document by a system, the summary evaluation device 100 executes a summary evaluation process routine illustrated in FIG. 4.

(20) First, in step S100, the K reference summaries and the system summary received by the input unit 10 are broken into sentences.

(21) Subsequently, in step S102, the words included in the K reference summaries and the system summary broken in step S100 are clustered using semantic vectors of words.

(22) In step S104, tuples which are sets of a word pair composed of a head word and a modifier word having a dependency relation and a label indicating the dependency relation for each of the K reference summaries and the system summary broken in step S100.

(23) In step S106, each of the headword and the modifier word of the word pair of each of the tuples extracted in step S104 is replaced with a class in the word clustering results in step S102.

(24) In step S108, a score corresponding to the degree of overlap between a group of tuples in all K reference summaries and a group of tuples of the system summary, replaced with the classes by the tuple extraction unit 34 is calculated according to Equation (3) above and is output to the output unit 50.

(25) As described above, according to the summary evaluation device according to the embodiment of the present invention, it is possible to evaluate a system summary with high accuracy according to the following steps.

(26) (1) Tuples which are sets of a word pair composed of a head word and a modifier word having a dependency relation and a label indicating the dependency relation for each of a plurality of reference summaries obtained in advance for a summary target document and a system summary generated for the summary target document by a system.

(27) (2) Each of the head word and the modifier word of the word pair of each of the extracted tuples is replaced with a class determined in advance for a word.

(28) (3) A score of the system summary is calculated on the basis of a group of tuples for all the plurality of reference summaries and a group of tuples of the system summary, replaced with the classes.

(29) The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the spirit of the present invention.

(30) For example, in the above-described embodiment, a case of replacing the head word and the modifier word with class IDs has been described as an example. However, the present invention is not limited thereto, but a word may be replaced with a value or the like corresponding to a cluster to which the word belongs.

(31) For example, in the above-described embodiment, a case in which a summary is broken into sentences by the sentence breaking unit 30 and words the summary are clustered by the word clustering unit 32 has been described. However, the present invention is not limited thereto, but the sentence breaking unit 30 and the word clustering unit 32 may not be provided and a reference summary and a system summary which are broken into sentences in advance, and a clustering result may be received in advance.

(32) In the above-described embodiment, although an embodiment in which a program is installed in advance has been described, the program may be provided in a state of being stored in a computer-readable recording medium and may be provided via a network.

REFERENCE SIGNS LIST

(33) 10 Input unit 20 Arithmetic unit 30 Sentence breaking unit 32 Word clustering unit 34 Tuple extraction unit 36 Score calculation unit 50 Output unit 100 Summary evaluation device

Summary evaluation device, method, program, and storage medium

Assignee

Inventors

Cpc classification

Classification Explorer

G06F16/345

PHYSICS

Classification Explorer

G06F40/20

PHYSICS

Classification Explorer

G06F40/279

PHYSICS

Classification Explorer

G06V30/2272

PHYSICS

Classification Explorer

G06F16/355

PHYSICS

Classification Explorer

G06F40/30

PHYSICS

Classification Explorer

G06F40/211

PHYSICS

International classification

Classification Explorer

G06F40/30

PHYSICS

Classification Explorer

G06F16/35

PHYSICS

Classification Explorer

G06F40/279

PHYSICS

Classification Explorer

G06F40/211

PHYSICS

Classification Explorer

G06F16/34

PHYSICS

Classification Explorer

G06V30/40

PHYSICS

Abstract

Claims

Description