SUMMARY GENERATING DEVICE, SUMMARY GENERATING METHOD, AND COMPUTER PROGRAM PRODUCT
20170270949 · 2017-09-21
Inventors
- Kosei FUME (Kawasaki Kanagawa, JP)
- Taira ASHIKAWA (Kawasaki Kanagawa, JP)
- Masayuki ASHIKAWA (Kawasaki Kanagawa, JP)
- Takashi Masuko (Kawasaki Kanagawa, JP)
Cpc classification
G10L15/02
PHYSICS
International classification
G10L15/02
PHYSICS
Abstract
A summary generating device includes a featural script extracting unit, a segment candidate generating unit, and a structuring estimating unit. The featural script extracting unit extracts featural script information of the words included in text information. Based on the extracted feature script information, the segment candidate generating unit generates candidates of segments that represent the constitutional units for the display purpose. Based on the generated candidates of segments and based on an estimation model for structuring, the structuring estimating unit estimates structure information containing information ranging from information of a comprehensive structure level to information of a local structure level.
Claims
1. A summary generating device comprising: a featural script extracting unit that extracts featural script information from text information; a segment candidate generating unit implemented in computer hardware comprising one or more hardware processors that, based at least in part on the extracted feature script information, generates candidates of segments representing constitutional units for display; and a structuring estimating unit implemented in computer hardware comprising one or more hardware processors that estimates structure information based at least in part on the generated candidates of segments and an estimation model for structuring, the structure information comprising information organized from a comprehensive structure level to a local structure level.
2. The summary generating device according to claim 1, further comprising a voice recognizing unit that performs voice recognition with respect to voice data and generates the text information based on the voice recognition of the voice data, wherein the featural script extracting unit extracts featural script of words included in the generated text information.
3. The summary generating device according to claim 2, wherein the voice recognizing unit generates the text information to include character information of utterances and timings of the utterances, and extracts audio features of the voice data, and the featural script extracting unit extracts timings corresponding to the character information and extracts the audio features as surrounding information with respect to the words.
4. The summary generating device according to claim 1, further comprising a display format converting unit that converts the estimated structure information into a display format for viewing.
5. The summary generating device according to claim 1, wherein, the structuring estimating unit presents the structure information with a priority determined based at least in part according to similarities between the structure information and the estimation model, the estimation model comprising learning data obtained from one or more prior processing results.
6. The summary generating device according to claim 4, wherein the display format converting unit changes the display format according to a user instruction.
7. A summary generating method implemented by a summary generating device, the summary generating method comprising: extracting featural script information of words from text information; generating, based at least in part on the extracted feature script information, candidates of segments representing constitutional units for display; and estimating, based at least in part on the generated candidates of segments and an estimation model for structuring, structure information comprising information organized from a comprehensive structure level to a local structure level.
8. A computer program product comprising a non-transitory computer readable medium that comprises a summary generating program, wherein the summary generating program, when executed by a computer, causes the computer to perform: extracting featural script information of words from text information; generating, based at least in part on the extracted feature script information, candidates of segments representing constitutional units for display; and estimating, based at least in part on the generated candidates of segments and an estimation model for structuring, structure information comprising information organized from a comprehensive structure level to of a local structure level.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0005]
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
DETAILED DESCRIPTION
[0020] According to one embodiment, a summary generating device includes a featural script extracting unit, a segment candidate generating unit, and a structuring estimating unit. The featural script extracting unit extracts featural script information of the words included in text information. Based on the extracted feature script information, the segment candidate generating unit generates candidates of segments that represent the constitutional units for the display purpose. Based on the generated candidates of segments and based on an estimation model for structuring, the structuring estimating unit estimates structure information containing information organized from a comprehensive structure level to a local structure level.
[0021] Embodiment
[0022]
[0023] In the configuration described above, the terminal device 200 obtains voice data of a meeting and sends that voice data to the summary generating device 100 via a network. The voice data is obtained from a microphone that is connected to the terminal device 200. In a meeting, either a single microphone or a plurality of microphones can be used. Since there are times when a meeting is conducted across different locations, there may be a case in which the summary generating system 1 includes a plurality of terminal devices 200. Herein, the terminal device 200 is an information device such as a personal computer (PC) or a tablet terminal.
[0024] The summary generating device 100 obtains voice data from the terminal device 200, detects an explicit summary request of a speaker or an expression for a structuring request included in a speech, and estimates appropriate display units (segments). Then, in response to a termination instruction from a speaker, the summary generating device 100 rearranges the segments depending on the contents thereof, converts them into various display formats, and outputs them. Herein, the summary generating device 100 is an information processing device such as a server device.
[0025]
[0026] The CPU 12 controls the operations of the entire summary generating device 100. The CPU 12 uses the RAM 14 as the work area and executes computer programs stored in the ROM 13 so as to control the operations of the entire summary generating device 100. The RAM 14 is used to temporarily store the information related to various operations, and is used as the work area during the execution of the computer programs stored in the ROM 13. Herein, the ROM 13 is used to store computer programs for implementing the operations of the summary generating device 100. The communicating unit 15 communicates with external devices such as the terminal device 200 via a network in a wireless manner or a wired manner. Meanwhile, the hardware configuration illustrated in
[0027]
[0028] The voice recognizing unit 110 performs a voice recognition operation with respect to voice data. More particularly, the voice recognizing unit 110 receives input of voice data that is sent from the terminal device 200. Then, the voice recognizing unit 110 performs a voice recognition operation, and generates text information containing character data of the utterances and information about the timings of utterances.
[0029] Moreover, the voice recognizing unit 110 identifies utterance sections and silent sections as the audio features of the voice data, and detects the duration of those sections. Meanwhile, the voice recognizing unit 110 may not be included in the summary generating device 100, and the configuration can be such that the featural script extracting unit 120 installed at latter stage performs operations based on the result of performing the voice recognition operation/an audio feature extraction operation.
[0030] The featural script extracting unit 120 extracts featural script information included in the text information. More particularly, the featural script extracting unit 120 performs morphological analysis with respect to the text information generated by the voice recognizing unit 110.
[0031]
[0032] Subsequently, the featural script extracting unit 120 performs segment label determination with respect to the text information. A segment label is the name expressing the role of a segment (a display unit), and represents metadata that is assigned depending on whether or not the following is included: the meaning class/the property information of a part of speech extracted at an earlier stage, or the text of an utterance not having the meaning class/the property information, or a command (instruction) for structuring. For example, a command for structuring represents an instruction to start structuring, and examples thereof include “start of bullet points”, “table begins here”, or “tabular format begins here”. Moreover, the featural script extracting unit 120 assigns utterance sections and silent sections, which are detected by the voice recognizing unit 110, as surrounding information.
[0033] Meanwhile, as the featural script information, information originating from the summary generating system 1 can also be used. For example, as the featural script information, the featural script extracting unit 120 obtains the following system-originating information if available: detection of a speaker ID based on the login user of a microphone or the connected terminal device 200; meeting information such as the meeting title referable to in tandem with the usage timing of the meeting room and the scheduler, the time of the meeting, the participants, and the meeting room; and detailed meeting information such as information on the individual speakers who input voice during the meeting.
[0034] The segment candidate generating unit 130 generates variation in the candidates of smallest constitutional units for structuring. Examples of the candidates for smallest constitutional units for structuring include, in descending order of granularity, character strings partitioned by units such as speakers, paragraphs, phrases, sequences of the same character type such as Kanji or Katakana, meaning classes, words, and parts of speech. More particularly, the segment candidate generating unit 130 reads the text information generated by the voice recognizing unit 110 and reads the featural script information extracted by the featural script extracting unit 120. Then, the segment candidate generating unit 130 detects the segment label present in each set of featural script information. For example, in the segment label detection; a start instruction, a termination instruction, or a label providing a clue of structuring is detected.
[0035] Then, the segment candidate generating unit 130 performs grouping of the sets of featural script information that have been read and stored before. For example, in the grouping, repetition of regular appearances of similar elements is detected or the appearance patterns of featural script information having different types are detected, and the units of such repetitions are grouped together. As an example, similar elements point to regular appearance of repetition of the elements (three elements) such as date, location, and arbitrary text.
[0036] Meanwhile, if a termination instruction regarding structuring is included in a segment label, then the segment candidate generating unit 130 performs ordering of the sets of featural script information that have been grouped before. Examples of ordering include the following methods: a method in which the ordering of the types of featural script is defined in advance and then the ordering is defined in a fixed manner; a method in which, in a specific example of the extracted featural script information, the ordering is performed based on the character length (average character length) included in each featural script; and a method in which the ordering is performed based on the inclusion number of a particular element (meaning class).
[0037]
[0038] The structuring estimating unit 140 estimates structure information based on the segment information. More specifically, the structuring estimating unit 140 reads the segment information generated by the segment candidate generating unit 130. Then, the structuring estimating unit 140 reads a structure estimation model from the structure estimation model 150. Herein, the structure estimation model is obtained by learning, as learning data, the exemplary formats suitable for display and the results edited/decided in the past. Based on such a structure estimation model; the structuring estimating unit 140 assigns combinations and patterns of appearances of the featural script information, and presents suitable structuring candidates in an ordered manner. In the initial presentation of the structure information, the structuring result having the highest likelihood is presented from among the ordered segment patterns.
[0039] Then, the structuring estimating unit 140 receives a decision instruction from a user. Herein, the instruction from the user is received via the instructing unit 160. For example, if the user has no issue with the current presentation candidates, a structuring result with the decided presentation candidates is presented. On the other hand, if a decision instruction from the user cannot be obtained (i.e., if a request for presentation of the next candidate is received), then the next structuring result is presented. In the case of presenting the next structuring result, the presentation can be done not only by changing the combination of segments but also by changing the variation by tracking back the manner of retrieval of the segments. Meanwhile, the presentation of the structuring result either can be output from the terminal device 200 or can be output from the summary generating device 100.
[0040]
[0041] The display format converting unit 170 converts the decided structuring result into a display format for user viewing. More particularly, the display format converting unit 170 reads the structuring result decided by the structuring estimating unit 140. Then, the display format converting unit 170 reads a display format conversion model. In the display format conversion model, definition patterns regarding the display format to be used for presentation are written corresponding to the structuring results; and the cascading style sheets (CSS) or the XSL transformations (XSLT) can be used for writing the definition patterns.
[0042] Subsequently, the display format converting unit 170 presents the initial conversion result according to the structuring result and the display format conversion model. In response to that presentation, if a decision instruction is received from the user via the instructing unit 160, then the display format converting unit 170 outputs the conversion result as a summary document. On the other hand, if a decision instruction from the user cannot be obtained (i.e., if a request for presentation of the next candidate is received), then the conversion result having the next highest likelihood is presented. Meanwhile, the presentation of the conversion result either can be output from the terminal device 200 or can be output from the summary generating device 100.
[0043]
[0044]
[0045] Subsequently, the featural script extracting unit 120 performs segment label determination with respect to the text information (Step S105). Then, the featural script extracting unit 120 assigns utterance sections and silent sections, which are detected by the voice recognizing unit 110, as surrounding information (Step S106). Subsequently, the featural script extracting unit 120 detects, as system-originating information, a speaker ID based on the login user of a microphone or the terminal device 200 (Step S107). Then, the featural script extracting unit 120 detects detailed meeting information managed by an external device (Step S108).
[0046]
[0047] Subsequently, the segment candidate generating unit 130 determines whether or not a termination instruction regarding structuring is included in the segment label (Step S205). If a termination instruction regarding structuring is included in the segment label (Yes at Step S205), then the segment candidate generating unit 130 performs ordering of the sets of featural script information that have been grouped (Step S206). However, if a termination instruction regarding structuring is not included in the segment label (No at Step S205), then the system control returns to Step S201.
[0048]
[0049] When a decision instruction is received from the user in response to the presentation of structure information (Yes at Step S304), the structuring estimating unit 140 assigns the presented candidate as the decided structure information (Step S305). However, if a decision instruction cannot be received from the user in response to the presentation of structure information (i.e., if a request for presentation of the next candidate is received) (No at Step S304), then the structuring estimating unit 140 presents the candidate of the structure information having the next highest score (Step S306). After a candidate is presented, the system control returns to Step S304 and a decision instruction from the user is awaited.
[0050]
[0051] Subsequently, when a decision instruction from the user is received in response to the presentation of the conversion result (Yes at Step S404), the display format converting unit 170 outputs the conversion result as a summary document (Step S405). However, if a decision instruction from the user cannot be received in response to the presentation of the conversion result (i.e., if a request for presentation of the next candidate is received) (No at Step S404), then the display format converting unit 170 presents the candidate having the next highest score of the conversion result (Step S406). After the candidate is presented, the system control returns to Step S404 and a decision instruction from the user is awaited.
[0052] According to the embodiment, from the result of voice recognition performed with respect to voice data, segments are estimated based on an explicit instruction by the speaker or based on an expression for a structuring request. Then, the segments are rearranged depending on the contents thereof and are presented upon being converted into various display formats. As a result, it becomes possible to cut down on the time and efforts required for advance preparation.
[0053] Meanwhile, the summary generating device 100 according to the embodiment can be implemented using, for example, a general-purpose computer device serving as the basic hardware. The computer programs that are executed contain modules for the constituent elements described above. The computer programs can be provided by recording as installable files or executable files in a computer-readable recording medium such as a compact disk read only memory (CD-ROM), a compact disk recordable (CD-R), or a digital versatile disk (DVD); or can be provided by storing in advance in a ROM.
[0054] While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.