SENTENCE EXTRACTING APPARATUS, PROGRAM
20190243846 ยท 2019-08-08
Assignee
Inventors
Cpc classification
International classification
Abstract
A sentence extracting apparatus includes: a hardware processor that analyzes a logical configuration of a document; extracts a first sentence including a specific key word from the document; and extracts, as a related sentence, another sentence located in a predetermined range starting from the first sentence, in the logical configuration.
Claims
1. A sentence extracting apparatus comprising: a hardware processor that analyzes a logical configuration of a document; extracts a first sentence including a specific key word from the document; and extracts, as a related sentence, another sentence located in a predetermined range starting from the first sentence, in the logical configuration.
2. The sentence extracting apparatus according to claim 1, wherein the document has a hierarchical structure.
3. The sentence extracting apparatus according to claim 2, wherein the hardware processor extracts a sentence as the related sentence, the sentence belonging to a lower hierarchy than a hierarchy to which the first sentence belongs in the logical configuration, the sentence being located at a place branched from the first sentence.
4. The sentence extracting apparatus according to claim 2, wherein the hardware processor extracts another sentence as the related sentence, the other sentence being in a hierarchy identical to a hierarchy to which the first sentence belongs in the logical configuration, the other sentence being at a position branched from a sentence that is a branching source of the first sentence.
5. The sentence extracting apparatus according to claim 1, wherein the hardware processor extracts a sentence as a first sentence when a character string included in the sentence matches a character string registered in advance.
6. A non-transitory recording medium storing a computer readable program causing an information processing apparatus to perform operating as the sentence extracting apparatus according to claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] The advantages and features provided by one or more embodiments of the invention will become more fully understood from the detailed description given hereinbelow and the appended drawings which are given by way of illustration only, and thus are not intended as a definition of the limits of the present invention:
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
DETAILED DESCRIPTION OF EMBODIMENTS
[0033] Hereinafter, one or more embodiments of the present invention will be described with reference to the drawings. However, the scope of the invention is not limited to the disclosed embodiments.
First Embodiment
[0034]
[0035] The PC 5 is a terminal device such as a personal computer used by a user. The PC 5 includes a central processing unit (CPU), read only memory (ROM), random access memory (RAM), and the like, and operates on the basis of various programs such as an operating system (OS) and an application program. In the embodiment of the present invention, in addition to creating and storing a document, the PC 5 inputs the document to the server 10 and requests the server 10 to extract a specific sentence from the document input.
[0036] When the document is input from the PC 5, the server 10 extracts the specific sentence from the document and returns the extraction result to the PC 5. The document to be input to the server 10 is a document having a hierarchical structure (tree structure) that is classified into a chapter, a section, an item, a body text, and the like.
[0037] In the embodiment of the present invention, the server 10 analyzes the logical configuration of the document and extracts a sentence (referred to as a first sentence) including a specific key word. In addition, in the logical configuration of the document, another sentence located in a predetermined range starting from the sentence (first sentence) including the key word is extracted as a related sentence.
[0038] Specifically, the related sentence is extracted by the following two methods.
(Related Sentence Extracting Method 1)
[0039] When the first sentence is a sentence belonging to an upper hierarchy of hierarchies constituting the document, such as a chapter or a section, a sentence of a lower hierarchy branching from the first sentence is extracted as a related sentence.
(Related Sentence Extracting Method 2)
[0040] Another sentence is extracted as a related sentence, the other sentence being in a hierarchy identical to a hierarchy to which the first sentence belongs, the other sentence being at a position branched from a sentence that is a branching source of the first sentence. The hierarchy to which the first sentence belongs only needs to be other than the highest hierarchy, but in the embodiment of the present invention, only when the first sentence is a sentence in the lowest hierarchy, the related sentence is extracted by the method.
[0041] In a document, a chapter or a section often includes only fragmentary words, and details are often described in a body text. In addition, contents supplementing one body text may be described in another body text. According to the present invention, not only a sentence including a specific key word but also another sentence having a high possibility of complementing contents of the sentence can be extracted, so that a possibility becomes low that other sentences have to be read again compared to a case where only the sentence including the specific key word is extracted.
[0042]
[0043] The CPU 11 operates on the basis of the OS program, and executes middleware, application programs, and the like on the OS program. The ROM 12 and the hard disk device 15 store various programs, and the CPU 11 executes various types of processing according to these programs, whereby functions of the server 10 are implemented.
[0044] The RAM 13 is used as a work memory that temporarily stores various data and an image memory that stores image data when the CPU 11 executes processing on the basis of the program.
[0045] The nonvolatile memory 14 is a memory (flash memory) in which stored contents are not destroyed even when the power supply is turned off and it is used for storing various types of setting information and the like. The hard disk device 15 is a large capacity nonvolatile storage device, and stores various programs and data in addition to image data and the like. In the embodiment of the present invention, a document input from the PC 5, a history of a document to which scoring is performed, each key word and its weight value, and the like are stored.
[0046] The network communication unit 16 functions to communicate with the PC 5 and other external devices via the network 3.
[0047] Further, in the embodiment of the present invention, the CPU 11 serves as an analyzer 30 that analyzes the logical configuration of a document, a sentence extractor 31 that extracts a first sentence including a specific key word from the document, and a related sentence extractor 32 that extracts, as a related sentence, another sentence located in a predetermined range starting from the first sentence, in the logical configuration of the document.
[0048] In the embodiment of the present invention, the server 10 first analyzes the document and grasps the logical configuration of the document.
[0049] In
[0050] A document 100 of
First product development department creation date and time Apr. 21, 2017
1. Technology development
[0051] 1-1 Theme A [0052] There are some imperfections in countermeasures against periodic defects, and re-countermeasures are being carried out.
[0053] 1-2 Theme B [0054] It is in progress as planned.
2. Product development
[0055] 2-1 Theme A [0056] Development has been completed
[0057] 2-2 Theme B [0058] There is no prospect of repairing faults, and the schedule is expected to be delayed.
3. Market problem
[0059] 3-1 Theme A [0060] Paper wrinkle problems have occurred frequently in initial lot.
[0061] 3-2 Theme B [0062] The effect of the countermeasure product is being confirmed at the customer OO.
[0063] When the document is separated for each punctuation mark and line feed, the document can be decomposed into the following sentences 1 to 16.
[0064] Sentence 1: First product development department creation date and time Apr. 21, 2017
[0065] Sentence 2: 1. Technology development
[0066] Sentence 3: 1-1 Theme A
[0067] Sentence 4: There are some imperfections in countermeasures against periodic defects, and re-countermeasures are being carried out.
[0068] Sentence 5: 1-2 Theme B
[0069] Sentence 6: It is in progress as planned.
[0070] Sentence 7: 2. Product development
[0071] Sentence 8: 2-1 Theme A
[0072] Sentence 9: Development has been completed
[0073] Sentence 10: 2-2 Theme B
[0074] Sentence 11: There is no prospect of repairing faults, and the schedule is expected to be delayed.
[0075] Sentence 12: 3. Market problem
[0076] Sentence 13: 3-1 Theme A
[0077] Sentence 14: Paper wrinkle problems have occurred frequently in initial lot.
[0078] Sentence 15: 3-2 Theme B
[0079] Sentence 16: The effect of the countermeasure product is being confirmed at the customer OO.
[0080] When the document 100 is decomposed into the sixteen sentences, the server 10 analyzes the structure of the document. Any method can be used as a method of analyzing the document structure; however, in the embodiment of the present invention, from the indentation, sequential number assignment, and the like, analysis is performed of whether each sentence is a chapter, a section, an item, or a body text, and their hierarchical structure.
[0081]
[0082] Next, the server 10 detects a sentence including a specific key word from the plurality of sentences obtained by decomposing. In the embodiment of the present invention, when a character string as the specific key word is registered in advance in the server 10 and the registered character string is in the sentence, the character string is detected.
[0083]
[0084] Next, a case will be described where another sentence is extracted as a related sentence with the above-described related sentence extracting method 1 when the sentences 4, 11, 12, and 14 are the first sentences, the other sentence being located in a predetermined range starting from the first sentence.
[0085] In the related sentence extracting method 1, first, a sentence of an upper hierarchy than that of the body text is searched from the sentence extracted as the first sentence. Here, focusing on the above-described sentences 4, 11, 12, and 14, it can be seen that only the sentence 12 is a sentence belonging to the upper hierarchy than that of the body text (see
[0086]
[0087] In the embodiment of the present invention, when a sentence of the body text is extracted, a branching source sentence is extracted in order toward the upper hierarchy from the sentence of the body text, and the extracted sentences are output as a list.
[0088] The list of
[0089]
[0090] The sentence extractor 31 serves as a dictionary matching unit 43 that compares each sentence with a key word indicated by the problem word dictionary 42A and a problem information database 42B to extract a sentence including the key word as a first sentence. The related sentence extractor 32 serves as a subordinate sentence extractor 44 that extracts, as a related sentence, a destination body text branched to a lower hierarchy from a first sentence on the basis of the first sentence. The hard disk device 15 further serves as a storage for storing the list described in
[0091]
[0092] Next, from the plurality of sentences, a sentence including a key word registered in advance is extracted as a first sentence (step S103). When there is no sentence of the upper hierarchy than that of the body text in the extracted first sentences (step S104; No), the processing proceeds to step S106. When there is a sentence of the upper hierarchy than that of the body text in the extracted first sentences (step S104; Yes), a body text in the lower hierarchy branched from the sentence is acquired as a related sentence (step S105).
[0093] Thereafter, a list is created and stored in which the extracted first sentence and related sentence corresponding to the body text are collected together with information of the upper hierarchy that is a branching source of each body text (step S106), and the processing is ended.
[0094] Next, the related sentence extracting method 2 will be described.
[0095] Among the twelve sentences of the document 101, the sentences 1 to 10 are common to the sentences 1 to 10 of the document 100 in
[0096] Sentence 11: The paper wrinkle problem has occurred in evaluation.
[0097] Sentence 12: Countermeasures have been carried out, but horizontal expansion to other themes is required
[0098]
[0099] The sentences including the key word illustrated in
[0100] In the document 100, two or more sentences are not branched from a sentence of the immediately upper hierarchy than the hierarchy to which the body text belongs (see
[0101] When the sentence 11 that is the first sentence is at a position branched from a certain sentence, and another sentence is at a position that is in the same hierarchy as that of the sentence 11 and branched from a sentence that is a branching source (branching source sentence) of the sentence 11, it is highly probable that the other sentence supplements contents of the sentence 11. Since the sentence 12 is a sentence of the same hierarchy as that of the sentence 11 that is the first sentence, and is the other sentence being at a position branched from the branching source sentence of the sentence 11, the sentence 12 is extracted as the related sentence.
[0102]
[0103] The list in
[0104]
[0105]
[0106] Next, from the plurality of sentences, a sentence including a key word registered in advance is extracted as a first sentence (step S203). It is checked whether or not there is another body text branched from a branching source sentence for the first sentence in the extracted first sentences (step S204), and when there is no other body text (step S204; No), the processing proceeds to step S206.
[0107] When there is the other body text (step S204; Yes), the sentence of the body text is acquired as a related sentence (step S205).
[0108] Thereafter, a list is created and stored in which the extracted first sentence and related sentence corresponding to the body text are collected together with information of the upper hierarchy that is a branching source of each body text (step S206), and the processing is ended.
[0109] In the above, the embodiment of the present invention has been described with reference to the drawings; however, the specific configuration is not limited to that illustrated in the embodiment, and even a configuration including changes and additions within the scope not deviating from the gist of the present invention is also included in the present invention.
[0110] In the embodiment of the present invention, the server 10 serves as the sentence extracting apparatus of the present invention; however, the sentence extracting apparatus is not limited thereto. For example, another device such as the PC 5 or an MFP may serve as the sentence extracting apparatus. In addition, a program causing an information processing apparatus to operate as the server 10 in the embodiment is also the present invention.
[0111] The method of extracting the first sentence from the document is not limited to that described in the embodiment of the present invention. The key words are not limited to those described in the embodiment of the present invention. The predetermined range starting from the first sentence is not limited to that described in the embodiment of the present invention. A related sentence may be extracted by a method other than the related sentence extracting method 1 and the related sentence extracting method 2, as long as it is a method of extracting a sentence in a range highly likely to be related to the first sentence.
[0112] In the embodiment of the present invention, a list is created by extracting a branching source sentence in order toward the upper hierarchy for each sentence of the extracted body text; however, without creating the list, only the first sentence and related sentence may be output as an extraction result.
[0113] In the embodiment of the present invention, the document is limited to a document having a hierarchical structure (tree structure); however, the document may be a document having no hierarchical structure. In the case of the document having no hierarchical structure, for example, sentences before and after a sentence extracted as a first sentence may be extracted as related sentences.
[0114] According to an embodiment of the present invention, with the sentence extracting apparatus and the program of the present invention, a sentence in a document having a hierarchical structure can be weighted by considering information other than the sentence.
[0115] Although embodiments of the present invention have been described and illustrated in detail, the disclosed embodiments are made for purposes of illustration and example only and not limitation. The scope of the present invention should be interpreted by terms of the appended claims.