ABSTRACT GENERATION DEVICE, METHOD, PROGRAM, AND RECORDING MEDIUM
20220189468 · 2022-06-16
Assignee
Inventors
- Tsutomu Hirao (Tokyo, JP)
- Atsunori OGAWA (Tokyo, JP)
- Tomohiro NAKATANI (Tokyo, JP)
- Masaaki NAGATA (Tokyo, JP)
Cpc classification
G10L15/22
PHYSICS
International classification
G10L15/193
PHYSICS
Abstract
A speech recognition unit (12) converts an input utterance sequence into a confusion network sequence constituted by a k-best of candidate words of speech recognition results; a lattice generating unit (14) generates a lattice sequence having the candidate words as internal nodes and a combination of k words among the candidate words for an identical speech as an external node, in which edges are extended between internal nodes other than internal nodes included in an identical external node, from the confusion network sequence; an integer programming problem generating unit (16) generates an integer programming problem for selecting a path that maximizes an objective function including at least a coverage score of an important word, of paths following the internal nodes with the edges extended, in the lattice sequence; and the summary generating unit generates a high-quality summary having less speech recognition errors and low redundancy using candidate words indicated by the internal nodes included in the path selected by solving the integer programming problem, under a constraint on the length of a summary to be generated.
Claims
1. A summary generating apparatus, comprising: a speech recognition unit configured to convert an input utterance sequence into a confusion network sequence constituted by a k-best of candidate words of speech recognition results; a lattice generating unit configured to generate a lattice sequence having the candidate words as internal nodes and a combination of k words among the candidate words for an identical speech as an external node, in which edges are extended between internal nodes other than internal nodes included in an identical external node, from the confusion network sequence; an integer programming problem generating unit configured to generate an integer programming problem for selecting a path that maximizes an objective function including at least a coverage score of an important word, of paths following the internal nodes with the edges extended, in the lattice sequence; and a summary generating unit configured to generate a summary of the utterance sequence using the candidate words indicated by the internal nodes included in the path selected by solving the integer programming problem, under a constraint on a length of a summary to be generated.
2. The summary generating apparatus according to claim 1, wherein the coverage score of the important word is a score that increases when the number of candidate words included in the summary to be generated increases, among candidate words that are independent words included in the lattice sequence.
3. The summary generating apparatus according to claim 1, wherein the objective function further comprises a score of the internal node represented by an importance of a candidate word included in the summary to be generated, and a score of an edge indicating a likelihood of connection between candidate words at both ends of the edge included in the summary to be generated.
4. The summary generating apparatus according to claim 3, wherein the score of the internal node comprises an appearance frequency and an inverse document frequency of the candidate word and a confidence of speech recognition for the candidate word.
5. The summary generating apparatus according to claim 3, wherein the score of the edge comprises a bigram appearance rate of candidate words at both ends of the edge.
6. A summary generating method performed at a summary generating apparatus comprising a speech recognition unit, a lattice generating unit, an integer programming problem generating unit, and a summary generating unit, the method comprising: converting, at the speech recognition unit, an input utterance sequence into a confusion network sequence constituted by a k-best of candidate words of speech recognition results; generating, at the lattice generating unit, a lattice sequence having the candidate words as internal nodes and a combination of k words among the candidate words for an identical speech as an external node, in which edges are extended between internal nodes other than internal nodes included in an identical external node, from the confusion network sequence; generating, at the integer programming problem generating unit, an integer programming problem for selecting a path that maximizes an objective function including at least a coverage score of an important word, of paths following the internal nodes with the edges extended, in the lattice sequence; and generating, at the summary generating unit, a summary of the utterance sequence using the candidate words indicated by the internal nodes included in the path selected by solving the integer programming problem, under a constraint on a length of a summary to be generated.
7. A summary generating program for causing a computer to function as each unit of the summary generating apparatus according to claim 1.
8. A storage medium that stores a summary generating program for causing a computer to function as each unit of the summary generating apparatus according to claim 1.
9. A summary generating program for causing a computer to function as each unit of the summary generating apparatus according to claim 2.
10. A summary generating program for causing a computer to function as each unit of the summary generating apparatus according to claim 3.
11. A summary generating program for causing a computer to function as each unit of the summary generating apparatus according to claim 4.
12. A summary generating program for causing a computer to function as each unit of the summary generating apparatus according to claim 5.
13. The summary generating apparatus according to claim 2, wherein the objective function further comprises a score of the internal node represented by an importance of a candidate word included in the summary to be generated, and a score of an edge indicating a likelihood of connection between candidate words at both ends of the edge included in the summary to be generated.
14. A summary generating program for causing a computer to function as each unit of the summary generating apparatus according to claim 13.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0015]
[0016]
[0017]
[0018]
[0019]
DESCRIPTION OF EMBODIMENTS
[0020] Hereinafter, an example of an embodiment for implementing the disclosed technology will be described in detail with reference to drawings.
[0021] A summary generating apparatus according to a present embodiment is configured as a computer including a central processing unit (CPU), a random access memory (RAM), a read only memory (ROM), a hard disk drive (HDD), and the like. A summary generating program according to the present embodiment is stored in the ROM. Note that the summary generating program may be stored in the HDD.
[0022] Also, the summary generating program may be installed in advance in the summary generating apparatus, for example. The summary generating program may be implemented by being installed in the summary generating apparatus appropriately by being stored in a nonvolatile storage medium or being distributed via a network. Note that examples of the nonvolatile storage medium include a compact disc read only memory (CD-ROM), a magneto-optical disc, a digital versatile disc read only memory (DVD-ROM), a flash memory, a memory card, and the like.
[0023] The CPU functions as each of functional units of the summary generating apparatus, which will be described below, by reading and executing the summary generating program stored in the ROM.
[0024] As illustrated in
[0025] The summary generating apparatus 10 functionally includes a speech recognition unit 12, a lattice generating unit 14, an integer programming problem generating unit 16, and a summary generating unit 18, as illustrated in
[0026] The speech recognition unit 12 converts the input utterance sequence into a k-best confusion network sequence. The confusion network is a unified representation of a plurality of speech recognition candidates as a single network. An example of the confusion network for the i-th utterance included in the utterance sequence is illustrated in
[0027] The speech recognition unit 12 passes the converted confusion network sequence to the lattice generating unit 14.
[0028] The lattice generating unit 14 converts the confusion network sequence received from the speech recognition unit 12 into a lattice sequence having candidate words of the speech recognition result as internal nodes and a combination of k words among the candidate words for an identical speech as an external node. The lattice generating unit 14 also prepares nodes designated as BOU and EOU as special nodes representing start and end of utterance, respectively. The lattice generating unit 14 extends an edge between any internal node and each of an internal node located to the left thereof and the BOU, and extends an edge between the node and each of an internal node located to the right thereof and the EOU. No edge is extended between internal nodes belonging to an identical external node. The lattice generating unit 14 thus generates a lattice sequence in which edges are extended to obtain all paths that follow the internal nodes from the BOU to the EOU.
[0029] The lattice generating unit 14 passes the generated lattice sequence to the integer programming problem generating unit 16.
[0030] The integer programming problem generating unit 16 generates an integer programming problem from the lattice sequence received from the lattice generating unit 14 for selecting a path of internal nodes that maximizes an objective function, under a constraint on the number of letters of a summary to be generated. In the present embodiment, an integer programming problem is generated for selecting a path which maximizes the sum of importances of internal nodes, the sum of importances of edges, and the coverage score of important words, from the lattice sequence.
[0031] The objective function of summary generation is shown in Expression (1) below.
[0032] Let i be an index of a lattice, let j be an index of an external node in the i-th lattice, and let k be an index of an internal node included in the j-th external node of the i-th lattice. Let a lattice set be U, let a set of external nodes in the i-th lattice be V.sub.i, and let a set of internal nodes included in the j-th external node in the i-th lattice be N.sub.i,j. In addition, let W be a set of independent words included in U.
[0033] In Expression (1), the first term represents a score of a node, the second term represents a score of an edge, and the third term represents a coverage score of an important word. n.sub.i,j,k is a binary variable representing whether the k-th word included in the j-th external node of the i-th lattice is included in a summary, and f.sub.i,j,k is an importance score for w.sub.i,j,k. The definition of f.sub.i,j,k is as shown in Expression (13) below.
f.sub.i,j,k=tfidf(w.sub.i,j,k)+conf(w.sub.i,j,k) [Math. 2]
[0034] Here, tfidf( ) is a tfidf score of a word and tf (term frequency) is an appearance frequency of a word in an utterance sequence. idf is obtained from IDF_DB 22. conf( ) is a recognition confidence score of a word, which is a value obtained when the speech recognition unit 12 performs speech recognition.
[0035] e.sub.i,s,p.sup.i,t,q is a binary variable of whether or not to include an edge between w.sub.i,s,p and w.sub.i,t,q in the summary. g.sub.i,s,p.sup.i,t,q is an importance score of e.sub.i,s,p.sup.i,t,q and can be a bigram probability of appearance of a word w.sub.i,s,p and a word w.sub.i,t,q obtained from the language model DB 20. The definition of g.sub.i,s,p.sup.i,t,q is as shown in the following Expression (14). Note that g.sub.i,s,p.sup.i,t,q is not limited to the example shown in Expression (14) as long as it is obtained by scoring the likelihood of word-to-word connection.
g.sub.i,s,p.sup.i,t,q=P.sub.bigram(w.sub.i,t,q|w.sub.i,s,p). [Math. 3]
[0036] α and β are parameters that adjust the sum of scores of nodes and the sum of scores of edges, and their optimal values are determined using data for verification. z.sub.h is a binary variable that is 1 if the h-th independent word in W is included in the summary and is 0 if the h-th independent word in W is not included in the summary, and the higher score thereof represents covering many important words. That is, there is an effect of covering many important words, and thus redundancy of the generated summary is reduced.
[0037] Expression (2) is a constraint on the summary length and ensures that the number of letters of the summary is less than or equal to L. Expression (3) represents that at most one internal node (word) is selected from any external node. Expressions (4) and (5) represent that, as illustrated in
[0038] In addition, if there is a word required in a summarized sentence as language knowledge in addition to the above, n.sub.i,j,k corresponding to the word only needs to be set to 1.
[0039] The integer programming problem generating unit 16 passes the generated integer programming problem to the summary generating unit 18.
[0040] The summary generating unit 18 solves the integer programming problem received from the integer programming problem generating unit 16 using an existing dedicated solver to extract w.sub.i,j,k that makes n.sub.i,j,k=1, thereby generating a summary and outputting the generated summary.
[0041] Next, operation of the summary generating apparatus 10 according to the present embodiment will be described with reference to
[0042] In step S12, the speech recognition unit 12 converts an input utterance sequence into a k-best confusion network sequence. The speech recognition unit 12 passes the converted confusion network sequence to the lattice generating unit 14.
[0043] Next, in step S14, the lattice generating unit 14 converts the confusion network sequence received from the speech recognition unit 12 into a lattice sequence having candidate words of speech recognition results as internal nodes and a combination of k words among the candidate words for an identical speech as an external node. In addition, the lattice generating unit 14 prepares nodes designated as BOU and EOU representing start and end of utterance, respectively, and extends edges between internal nodes other than internal nodes belonging to the identical external node. The lattice generating unit 14 passes the generated lattice sequence to the integer programming problem generating unit 16.
[0044] Next, in step S16, the integer programming problem generating unit 16 generates an integer programming problem for selecting a path of internal nodes that maximizes an objective function including a score of an internal node, a score of an edge, and a coverage score of an important word, from the lattice sequence received from the lattice generating unit 14, under a constraint on the number of letters of a generated summary. The integer programming problem generating unit 16 passes the generated integer programming problem to the summary generating unit 18.
[0045] Next, in step S18, the summary generating unit 18 solves the integer programming problem received from the integer programming problem generating unit 16 using an existing dedicated solver, generates a summary using candidate words indicated by the internal nodes included in the path selected from the lattice sequence, and outputs the generated summary. The summary generation processing then ends.
[0046] As described above, the summary generating apparatus according to the present embodiment converts an input utterance sequence into a confusion network sequence constituted by a k-best of candidate words of speech recognition results. Furthermore, the summary generating apparatus according to the present embodiment generates a lattice sequence having the candidate words as internal nodes and a combination of k words among the candidate words for an identical speech as an external node, in which edges are extended between internal nodes other than internal nodes included in the identical external node, from the confusion network sequence. The summary generating apparatus according to the present embodiment further generates an integer programming problem for selecting a path that maximizes an objective function including at least a coverage score of an important word, among paths following internal nodes with edges extended, in the lattice sequence. Furthermore, the summary generating apparatus according to the present embodiment generates a summary of the utterance sequence using candidate words indicated by the internal nodes included in the path selected by solving the integer programming problem under a constraint on the length of the summary to be generated. With the processing described above, the summary generating apparatus according to the present embodiment can generate a high-quality summary having few speech recognition errors and low redundancy.
[0047] Note that configuration and processing of the summary generating apparatus described in the aforementioned embodiment each are just an example and can be modified in accordance with situations without departing from the gist.
[0048] In addition, the flow of the processing of the program described in the aforementioned embodiment is also an example, and unnecessary steps may be deleted, new steps may be added, or the processing order may be changed without departing from the gist.
[0049] Also, although the case in which the processing according to the above embodiment is implemented by a software configuration using a computer executing the program has been described in the aforementioned embodiment, the embodiment is not limited thereto. The embodiment may be implemented by a hardware configuration or a combination of a hardware configuration and a software configuration, for example.
REFERENCE SIGNS LIST
[0050] 10 Summary generating apparatus [0051] 12 Speech recognition unit [0052] 14 Lattice generating unit [0053] 16 Integer programming problem generating unit [0054] 18 Summary generating unit [0055] 20 Language model DB [0056] 22 IDF_DB