NON-TRANSITORY COMPUTER READABLE MEDIUM AND INFORMATION PROCESSING APPARATUS AND METHOD
20170249299 · 2017-08-31
Assignee
Inventors
Cpc classification
G06F3/04847
PHYSICS
G06F40/58
PHYSICS
G06V30/414
PHYSICS
International classification
G06F3/0484
PHYSICS
Abstract
A non-transitory computer readable medium storing a translation program causes a computer to execute a process. The process includes: displaying image information, text regions, and original text in association with each other, the text regions being obtained by extracting regions including an image of text from the image information, the original text being obtained by performing character recognition on the text included in the text regions; and editing the text regions in accordance with the content of a received operation.
Claims
1. A non-transitory computer readable medium storing a translation program causing a computer to execute a process, the process comprising: displaying image information, text regions, and original text in association with each other, the text regions being obtained by extracting regions including an image of text from the image information, the original text being obtained by performing character recognition on the text included in the text regions; and editing the text regions in accordance with the content of a received operation.
2. The non-transitory computer readable medium according to claim 1, wherein, in the editing of the text regions, selecting of a text region among the displayed text regions or a portion of the displayed original text is received as part of the content of the received operation.
3. The non-transitory computer readable medium according to claim 2, wherein, in the editing of the text regions, merging or dividing of the selected text region or a text region corresponding to the selected portion of the displayed original text is performed in accordance with the content of the received operation.
4. The non-transitory computer readable medium according to claim 1, the process further comprising: calculating a reliability degree of each of the text regions on the basis of the content of the original text extracted from a corresponding text region, wherein a text region for which the reliability degree is calculated to be lower than a predetermined threshold is displayed as a text region candidate to be corrected.
5. The non-transitory computer readable medium according to claim 2, the process further comprising: calculating a reliability degree of each of the text regions on the basis of the content of the original text extracted from a corresponding text region, wherein a text region for which the reliability degree is calculated to be lower than a predetermined threshold is displayed as a text region candidate to be corrected.
6. The non-transitory computer readable medium according to claim 3, the process further comprising: calculating a reliability degree of each of the text regions on the basis of the content of the original text extracted from a corresponding text region, wherein a text region for which the reliability degree is calculated to be lower than a predetermined threshold is displayed as a text region candidate to be corrected.
7. The non-transitory computer readable medium according to claim 1, the process further comprising: calculating a reliability degree of each of the text regions on the basis of the content of the original text extracted from a corresponding text region; and estimating a new text region candidate on the basis of an area of a new text region obtained by merging text regions for which the reliability degree is calculated to be lower than a predetermined threshold and on the basis of a total area of the text regions which have not been merged, wherein the estimated new text region candidate is displayed.
8. The non-transitory computer readable medium according to claim 2, the process further comprising: calculating a reliability degree of each of the text regions on the basis of the content of the original text extracted from a corresponding text region; and estimating a new text region candidate on the basis of an area of a new text region obtained by merging text regions for which the reliability degree is calculated to be lower than a predetermined threshold and on the basis of a total area of the text regions which have not been merged, wherein the estimated new text region candidate is displayed.
9. The non-transitory computer readable medium according to claim 3, the process further comprising: calculating a reliability degree of each of the text regions on the basis of the content of the original text extracted from a corresponding text region; and estimating a new text region candidate on the basis of an area of a new text region obtained by merging text regions for which the reliability degree is calculated to be lower than a predetermined threshold and on the basis of a total area of the text regions which have not been merged, wherein the estimated new text region candidate is displayed.
10. An information processing apparatus comprising: a display controller that displays image information, text regions, and original text in association with each other, the text regions being obtained by extracting regions including an image of text from the image information, the original text being obtained by performing character recognition on the text included in the text regions; and a text region editor that edits the text regions in accordance with the content of a received operation.
11. An information processing method comprising: displaying image information, text regions, and original text in association with each other, the text regions being obtained by extracting regions including an image of text from the image information, the original text being obtained by performing character recognition on the text included in the text regions; and editing the text regions in accordance with the content of a received operation.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] Exemplary embodiments of the present invention will be described in detail based on the following figures, wherein:
[0005]
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
DETAILED DESCRIPTION
First Exemplary Embodiment
(Configuration of Information Processing Apparatus)
[0026]
[0027] The information processing apparatus 1 includes a control device 10, a storage device 11, and a communication device 12. The control device 10, which is constituted by, for example, a central processing unit (CPU), controls the individual elements of the information processing apparatus 1 and also executes various programs. The storage device 11 is constituted by a storage medium, such as a flash memory, and stores information therein. The communication device 12 communicates with an external source via a network.
[0028] By executing a translation program 110, which will be discussed later, the control device 10 is able to serve as a document receiver 100, a text region extractor 101, a text recognizer 102, a translator 103, a display controller 104, a text region editor 105, a reliability degree calculator 106, and a text region estimator 107.
[0029] The document receiver 100 receives document information from an external source via the communication device 12. The document information is image information indicating characters and images or image information indicating images obtained by scanning printed matter including characters and images.
[0030] If text is in included in the image information received by the document receiver 100, the text region extractor 101 extracts regions where text items are disposed, as text regions. The text region extractor 101 registers the coordinates, height, and width of each of the extracted text regions in text region information 111 of the storage unit 11.
[0031] The text recognizer 102 recognizes the text included in each of the text regions extracted by the text region extractor 101 by using, for example, an optical character recognition (OCR) technique, so as to generate text information. The text recognizer 102 also registers the generated text information in the text region information 111 as the original text.
[0032] The translator 103 translates the text information generated by the text recognizer 102 as the original text into another language so as to generate translated text.
[0033] The display controller 104 displays an operation screen, and also displays the received image information and the original text in association with each other by referring to the text region information 111.
[0034] The text region editor 105 merges text regions or divides a text region automatically or in accordance with the content of an operation performed by a user.
[0035] The reliability degree calculator 106 calculates, as an index, the reliability degree representing how much a character string included in a text region is reliable as a character string.
[0036] The text region estimator 107 merges text regions having a low reliability degree or divides a text region having a low reliability degree. Approaches to merging text regions and dividing a text region will be discussed later.
[0037] In the storage device 11, the translation program 110, the text region information 111, and reliability degree information 112 are stored. The translation program 110 causes the control device 10 to operate as the document receiver 100, the text region extractor 101, the text recognizer 102, the translator 103, the display controller 104, the text region editor 105, the reliability degree calculator 106, and the text region estimator 107.
[0038] The information processing apparatus 1 is connected to a terminal (not shown) via the communication device 12, and executes processing in response to a request from the terminal and transmits the processing result to the terminal. The terminal receives the processing result and displays a screen indicating the processing result on a display. The screen displayed on the terminal will be discussed later.
(Operation of Information Processing Apparatus)
[0039] The operation of the information processing apparatus of the first exemplary embodiment will now be described below.
[0040] A user first accesses the information processing apparatus 1 by using a terminal (not shown) and requests the information processing apparatus 1 to translate a document.
[0041] The information processing apparatus 1 receives a request from the user and causes a display of the terminal to display the following screen.
[0042]
[0043] The screen 104a includes a selection field 104a.sub.1 for selecting a source (original) language, a selection field 104a.sub.2 for selecting a target language, a selection field 104a.sub.3 for selecting a document to be translated, and a selection field 104a.sub.4 for selecting a page of the document to be translated. The screen 104a also includes buttons 104a.sub.3, one of which is used for instructing the execution of translation and the other one of which is used for instructing the canceling of a request.
[0044] The user selects one of the source languages in the selection field 104a.sub.1, selects one of the target languages in the selection field 104a.sub.2, specifies the document to be translated (document information) in the selection field 104a.sub.3, selects a page of the document to be translated in the selection field 104a.sub.4, and then presses the button 104a.sub.5 for instructing the execution of translation.
[0045]
[0046] In step S1, the document receiver 100 receives document information concerning a document specified in the selection field 104a.sub.3. The document information is, for example, the following image information.
[0047]
[0048] The image information 100a has a title “5. Correlation of diagram of seller and buyer” and a table constituted by items “Key Issue” and “environment”.
[0049] Then, in step S2, the text region extractor 101 specifies the first page of the image information 100a received by the receiver 100 as the subject page. In step S3, the text region extractor 101 extracts, as text regions, regions where text items are disposed in the first page. The text regions may be extracted by using a technique such as the layout structure recognition in a document image.
[0050]
[0051] Text regions 101a.sub.1 through 101a.sub.15 are extracted from the image information 100a by the text region extractor 101. A text region would normally be extracted according to the word or the phrase, such as in the text regions 101a.sub.10 and 101a.sub.12. However, there may be some cases in which a text region is extracted according to the character, such as in the text regions 101a.sub.1 and 101a.sub.2, and a text region is extracted as a set of words which does not make sense as a phrase, such as in the text region 101a.sub.11, or text regions are extracted as a word or a set of words at a position at which one phrase is split inappropriately, such as in the text regions 101a.sub.13 through 101a.sub.15.
[0052] Then, in step S4, the text recognizer 102 recognizes the text included in each of the text regions extracted by the text region extractor 101 by using, for example, an OCR technique, thereby generating the text region information 111.
[0053]
[0054] The text region information 111 indicates the original text obtained as a result of performing character recognition on the images within the text regions, the coordinates, height, and width of each of the extracted text regions, and the image of the text included in each of the text regions.
[0055] Then, in step S5, the display controller 104 displays the received image information and the original text in association with each other by referring to the text region information 111, and receives an editing operation of the image information displayed on the screen from the user. Details of step S5 are shown in the flowchart of
[0056]
[0057] In step S50, the display controller 104 displays the document information (image information) and the original text, as indicated in the following screen shown in
[0058]
[0059] The screen 104b includes a document display region 104b.sub.1 in which the received image information and the extracted text regions are displayed, and a character recognition result display region 104b.sub.2 in which the original text is displayed. The screen 104b also includes a page switching button 104b.sub.3 for switching the page if there are plural pages in the document information, merge check boxes 104b.sub.4 for selecting text regions to be merged together, and a selection frame 104b.sub.5 to be displayed in accordance with the check result of the merge check boxes 104b.sub.4. The screen 104b also includes a divide button 104b.sub.6 for dividing a text region by using a cursor, which will be discussed later, a merge button 104b.sub.7 for merging the text regions for which the merge check boxes 104b.sub.4 are checked, a translation button 104b.sub.8 for translating the original text in all the text regions, and a cancel button 104b.sub.9 for closing the screen 104b.
[0060] If one of the selection frame 104b.sub.5 and the merge check boxes 104b.sub.4 is selected, the other one is automatically selected. That is, if a text region in the document image is selected by using the selection frame 104b.sub.5, the corresponding merge check boxes 104b.sub.4 are checked, so that the user can identify how (as which characters) the text included in the selected text region is recognized. If a certain portion of the original text is selected by using the corresponding merge check box 104b.sub.4, the selection frame 104b.sub.5 appears within the document image, so that the user can identify in which text region the selected portion of the original text is included.
[0061] If the user checks merge check boxes 104b.sub.4 on the screen 104b, the text region editor 105 receives the input of the merge check boxes 104b.sub.4 in step S51.
[0062] Then, if the user presses the merge button 104b.sub.7, the text region editor 105 receives the operation for pressing the merge button 104b.sub.7 in step S52. Then, in step S53, the text region editor 105 merges the text regions corresponding to the selected merge check boxes 104b.sub.4 and updates the text region information 111. Then, in step S54, the text region editor 105 updates the display content of the screen 104b, as shown in
[0063]
[0064] In accordance with the above-described operation, the text regions selected by using the selection frame 104b.sub.5 are merged together, so that the character string within the merged text region is indicated as “Correlation diagram of seller and buyer”. An OCR button 104b.sub.10 has appeared in a portion corresponding to the merged text region in the character recognition result display region 104b.sub.2. If the first character recognition in the merged text region is not correct, the user presses the OCR button 104b.sub.10 so that the text recognizer 102 can recognize the text in the merged text region again.
[0065] In another example, if the user presses the divide button 104b.sub.6, the text region editor 105 receives the operation for pressing the divide button 104b.sub.6 in step S52. Then, in step S53, the text region editor 105 divides the corresponding text region and updates the text region information 111. Then, in step S54, the text region editor 105 updates the display content of the screen 104b, as shown in
[0066]
[0067] As shown in
[0068] Then, if the user judges that the text regions are appropriately merged or divided, the user presses the translation button 104b.sub.8.
[0069] In step S8 of
Second Exemplary Embodiment
[0070] A second exemplary embodiment of the invention will be described below. The second exemplary embodiment is different from the first exemplary embodiment in that it is determined whether or not it is more suitable if text regions extracted by the text region extractor 101 are merged or divided, and, if it is such a case, a new text region is estimated. The configuration of the second exemplary embodiment is similar to that of the first exemplary embodiment, and an explanation thereof will thus be omitted.
[0071] In a manner similar to the first exemplary embodiment, the information processing apparatus 1 executes steps S1 through S4 in
[0072]
[0073] The reliability degree calculator 106 calculates the reliability degree of each text item in the original text by making the following determinations. If a text item is a character or a word, it is determined whether or not the character or the word is included in a prepared dictionary. If a text item is a sentence, it is determined whether or not it is grammatically correct. In the reliability degree information 112 shown in
[0074] The display controller 104 may display the content of the reliability degree information 112 shown in
[0075] Then, the text region estimator 107 creates all possible combinations of thirteen text items for which the reliability degree is calculated to be equal to or lower than a predetermined threshold (for example, 50). The combinations of the text items are formed into a list 107a, as shown in
[0076]
[0077] The number of combinations included in the list 107a is 2.sup.13=8192.
[0078] Then, the text region estimator 107 assumes all the combinations of the text items included in the list 107a as new text regions. The text region estimator 107 then adds the coordinates at the top left of each new text region and the width and height of each new text region to the list 107a as information, thereby creating a new list 107b, as shown in
[0079]
[0080] As shown in
[0081] Then, the text region estimator 107 calculates, from the list 107b, the area of the new text region and the total area of the original text regions included in the new text region, and compares the two areas with each other.
[0082]
[0083] The area of a new text region 107d.sub.1 corresponding to the topmost line of the list 107b [C: 60, 10, 8, 12] is 8×12=96. The area of the text region 107c.sub.1 of the original text “C” included in the new text region 107d.sub.1 is 8×12=96. The ratio between the two areas is 96/96=1.
[0084] If the ratio between the two areas is equal to or higher than a predetermined threshold (for example, 0.7), the text region estimator 107 determines the new text region 107d.sub.1 to be a new text region candidate.
[0085]
[0086] The area of a new text region 107d.sub.2 corresponding to [C, o, r, r, e, l, a, t, i, o, n: 60, 10, 70, 12] in the list 107b is 70×12=840. The total area of the text regions 107c.sub.1 through 107c.sub.11 of the original text items “C”, “o”, “r” “r”, “e”, “l”, “a”, “t”, “i”, “o”, and “n” included in the new text region 107d.sub.2 is 750. The ratio between the two areas is 750/840=0.89.
[0087] If the ratio between the two areas is equal to or higher than the predetermined threshold (for example, 0.7) and is not 1, the text region estimator 107 determines the new text region 107d.sub.2 to be a new text region candidate and eliminates the new text region 107d.sub.1 for which the ratio is calculated to be 1 from text region candidates. The reason for this is that the user can more easily identify text in a region of a larger area.
[0088]
[0089] The area of a new text region 107d.sub.3 corresponding to [C, r, using its high: 60, 10, 200, 60] in the list 107b is 200×60=12000. The total area of the text regions 107c.sub.1, 107c.sub.3, and 107c.sub.11 of the original text items “C”, “r”, “using its high” included in the new text region 107d.sub.3 is 1200. The ratio between the two areas is 1200/12000=0.10.
[0090] If the ratio between the two areas is lower than the predetermined threshold (for example, 0.7), the text region estimator 107 does not set the new text region 107d.sub.3 to be a new text region candidate.
[0091] Concerning all the new text regions in the list 107b, the text region estimator 107 performs the above-described calculations, and, as a result, new text region candidates are determined, as shown in
[0092]
[0093] The text region estimator 107 estimates new text region candidates 107e.sub.1 through 107e.sub.3 by performing the above-described calculations.
[0094] The text region estimator 107 may also merge a text region having a sufficiently high reliability degree with the new text region candidates 107e.sub.1 through 107e.sub.3.
[0095]
[0096] For example, the y coordinate at the top left of the new text region candidate 107e.sub.1 in
[0097] The x coordinate at the top left of the new text region candidate 107e.sub.3 in
[0098] Then, the display controller 104 causes merge check boxes 104c.sub.4 on a screen 104c shown in
[0099]
[0100] The configuration of the screen 104c is similar to that of the screen 104b shown in
[0101] The display controller 104 may display the content of the screens shown in
Other Exemplary Embodiments
[0102] The present invention is not restricted to the above-described exemplary embodiments, and various modifications may be made without departing from the spirit of the invention.
[0103] In the above-described exemplary embodiments, the functions of the document receiver 100, the text region extractor 101, the text recognizer 102, the translator 103, the display controller 104, the text region editor 105, the reliability degree calculator 106, and the text region estimator 107 of the control device 10 are implemented by a program. Alternatively, all or some of these functions may be implemented by hardware, such as an application-specific integrated circuit (ASIC). The program used in the above-described exemplary embodiments may be stored in a recording medium such as a compact disc-read only memory (CD-ROM) and be provided. Additionally, swapping, deletion, and addition of steps discussed in the above-described exemplary embodiments may be performed without departing from the spirit of the invention.
[0104] The foregoing description of the exemplary embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.