A METHOD AND A SYSTEM FOR PROVIDING AN EXTRACT DOCUMENT

20190361962 · 2019-11-28

    Inventors

    Cpc classification

    International classification

    Abstract

    A method and a system for providing an extract document from a source document, the source document being a classified document, the method including the steps of: a) providing the source document in a computer readable format, selecting at least one item from the source document, establishing an identifying data set to identify the at least one item that has been selected, validating the at least one item that has been selected, e) providing the extract document in a fixed format by performing an irreversible conversion of the source document, based on the source document and the identifying data set for the at least one item that has been validated.

    Claims

    1. A method of providing an extract document from a source document, said source document being a classified document, said method comprising the steps of a) providing said source document in a computer readable format, b) selecting at least one item from said source document, c) establishing an identifying data set to identify said at least one item that has been selected, d) validating said at least one item that has been selected, e) providing the extract document in a fixed format by performing an irreversible conversion of said source document, based on said source document and said identifying data set for said at least one item that has been validated.

    2. The method according to claim 1, wherein steps b) and c) are repeated for said source document, before step d) is performed for the source document in its entirety.

    3. The method according to claim 1, wherein step d) of validating said at least one item that has been selected comprises acknowledging the at least one selected item or rejecting the at least one item that has been selected.

    4. The method according to claim 3, wherein step b) and step c) are repeated subsequent to step d) and prior to step e).

    5. The method according to claim 1, wherein step e) of providing the extract document by performing an irreversible conversion of said source document, based on said source document and said identifying data set for said at least one item that has been validated comprises masking in the extract document said at least one item that has been validated.

    6. The method according to claim 1, wherein said identifying data set by means of which said at least at least one item that has been selected and/or validated is identified, is stored together with a source document identification.

    7. The method according to claim 1, wherein said irreversible conversion according to step e) comprises conversion of the source document being in an intermediate extract version with the at least one item that has been validated masked off into an image document.

    8. The method according to claim 1, wherein said source document is provided as a text document.

    9. The method according to claim 1, wherein said at least one item that has been selected from said source document may be one of a word, a plurality of words in sequence, a paragraph, a box and combinations of the above.

    10. The method according to claim 9, wherein said box may comprise a picture, an image, a drawing, a diagram and/or a word.

    11. The method according to claim 1, wherein said step b) of selecting at least one item from said source document is facilitated by one of using a focusing functionality using e.g. OCR recognition, marking a plurality of words, a paragraph and/or a document area.

    12. A system for providing an extract document from a source document using a method according to claim 1, said system comprising a computer apparatus, display means and input means, said system being configured for displaying said source document on said display means, facilitating at least one item from said source document to be selected in a manner without amending the source document, establishing an identifying data set to identify said at least one item that has been selected, facilitating a validation process of said at least one item that has been selected, and providing the extract document upon a completed validation process by performing an irreversible conversion of said source document, based on said source document and said identifying data set for said at least one item that has been validated.

    13. The system according to claim 12, wherein said system is configured for storing said identifying data set by means of which said at least at least one item that has been selected and/or validated is identified, together with a source document identification.

    14. The system according to claim 12, wherein said system is configured for facilitating selection of at least one item from said source document by one of using a focusing functionality using e.g. OCR recognition, and marking a plurality of words, a paragraph and/or a document area.

    15. The system according to claim 12, wherein said system is configured for performing said irreversible conversion by a conversion of the source document being in an intermediate extract version with the at least one item that has been validated masked off into an image document.

    16. A computer program product comprising computer readable instructions for carrying out all of the steps of the method claim 1, when the computer program product is executed on a suitable computer system.

    17. The method according to claim 7, wherein the irreversible conversion according to step e) comprising conversion of the source document in the intermediate extract version into an image document is followed by a conversion into a portable document format.

    18. The system according to claim 15, wherein the system that is configured for performing the irreversible conversion by a conversion of the source document in the intermediate extract version into an image document, furthermore is configured for performing a subsequent conversion into a portable document format.

    Description

    THE FIGURES

    [0077] The invention will be explained in further detail below with reference to the figures of which

    [0078] FIG. 1 shows an example of a workflow according to an embodiment of the invention,

    [0079] FIG. 2 shows a further example of a workflow according to an embodiment of the invention,

    [0080] FIG. 3 illustrates an example of a graphical user interface for an extract application program according to an embodiment of the invention, and

    [0081] FIG. 4 illustrates further exemplary embodiments according to the invention.

    DETAILED DESCRIPTION

    [0082] In FIG. 1 an example of a workflow according to an embodiment of the invention is shown. According to this example of a workflow. an extract application program is activated and from this application program a source document is loaded (at 1) into a suitable computer apparatus or computer device, e.g. a laptop computer, a stationary computer, etc., and displayed to the user on a corresponding display means. The source document may be a document that is to be forwarded to a person, who has requested access to a file, wherein the source document is contained. The source document, which may be in a text format, may be loaded into the computer apparatus from e.g. an electronic archive or the document may be scanned and loaded into the computer apparatus. Other manners of providing and loading the source document may be used as well.

    [0083] When the source document has been loaded and displayed on the display means, the user can search (at 2) the document for certain words, names, abbreviations, acronyms, numbers, etc., e.g. by using an OCR method for detecting certain words. The search can be initiated using input means such as keyboard, computer mouse, or other computer input means. Furthermore, one or more of the OCR recognized words can be focused by navigating to the word using keyboard or computer mouse. When an OCR recognized word is focused by the application program, the word will be marked using e.g. a first marking colour, enhancement or the like to indicate that the word is an OCR recognized word.

    [0084] The focused words can subsequently (at 3) be reviewed and selected, which is indicated by a marking using e.g. a second marking colour, enhancement or the like that is different from the first marking to indicate that the user has selected the one or more words.

    [0085] Furthermore, when two or more OCR recognized words, which are placed next to each other are selected, the words as well as the space between the words are marked as an unbroken marking.

    [0086] Further, other manners of selecting items from the source document are provided for as indicated at 4. For example, in a paragraph mode a plurality of OCR recognized words can be selected by e.g. the computer mouse, by means of which a box can be defined, covering the plurality of words in e.g. a paragraph. According to another example, other items than OCR recognized words can be selected in a box mode, whereby a box can be defined by e.g. the computer mouse, which box can cover such items as images, drawings, diagrams, words that have not been OCR recognized, etc.

    [0087] As it will be explained in further detail below in connection with FIG. 2, the markings of the selected items in the document can be saved using a save functionality. The source document remains unamended, but data for identifying the marked items are saved in an intermediate or temporary file together with an identification of the source document. When the work is resumed, the respective source document is reloaded together with the intermediate or temporary file containing the data for identifying the marked items.

    [0088] Returning to FIG. 1, the application program provides at 5 a validating function, where e.g. a supervisor or the like can review the selectedand thus markeditems in the document.

    [0089] On completion of the validation at 5, the resulting extract document can be generated at 6 in that the selected and validated items are masked, e.g. completely covered, replaced or the like with black colour, e.g. by a black box, to fully prevent anything of the items to be recognizable and an irreversible conversion is made, e.g. into an image document to prevent any information about the selected, validated and masked items to be retrievable from the resulting extract document.

    [0090] Subsequent to this, the resulting extract document in image format may at 7 be converted into a portable document format (pdf) to facilitate the handling and forwarding of the resulting extract document to the person or third party that has requested the access to the document.

    [0091] In FIG. 2 is shown a workflow essentially as discussed in connection with FIG. 1, but furthermore it is exemplified here that in connection with the searching 2, reviewing and selecting 3, 4 it is possible for the user freely to jump between the various steps as indicated by the return loops 9.

    [0092] Also, it is shown in FIG. 2 that in connection with the validating function 5, where e.g. a supervisor or the like can review the selectedand thus markeditems in the document, it is possible for the supervisor to either approve (yes) or disapprove (no) the selected items in the document, In the latter case the person having made the work can amend or correct, i.e. as indicated by the punctuated return loop 10 that allows the user to return to a prior step.

    [0093] Further, a save functionality 8 is shown, whereby it is possible in connection with each step to save the work already performed, e.g. the markings of the selected items in the document can be saved using this save functionality. By this save functionality the source document remains unamended, but e.g. data for identifying the marked items are saved in an intermediate or temporary file together with an identification of the source document. When the work is resumed, the respective source document is reloaded together with the intermediate or temporary file containing the data for identifying the marked items. The work can be resumed at the same step as where it was saved, but in essence it may be resumed at any of the steps 2, 3 and 4.

    [0094] As indicated, it can also be possible for the supervisor in connection with the validating function 5 to use the save functionality 8 as indicated by punctuated lines.

    [0095] FIG. 3 illustrates an example of a graphical user interface for an extract application program according to an embodiment of the invention, where an editor 20 and a viewer 40 are shown.

    [0096] The editor comprises for example a key 22 for opening a source document, e.g. for finding and loading the document, a key 24 for saving the work performed, e.g. by saving the data relating to the work in an intermediate or temporary file together with an identification only of the source document, a key 26 for selecting an item in the source document and a key 28 for performing an extraction on the document.

    [0097] The user will initiate the work in the editor 20 by finding, loading and opening the respective source document, which in FIG. 3 is shown as a relative simple example 32a. The user may subsequently proceed by searching for items such as words, selecting one or more of these and/or selecting other items by marking these with boxes as indicated by the source document in the selected version 32b.

    [0098] Subsequent to a validation having been performed and by operating the extract key 28, the extract document 42 will be shown in the viewer 40 with the respective selected and validated items blackened out with black boxes 44.

    [0099] FIG. 4 illustrates further exemplary embodiments of the method and the system according to the invention. Here, it is shown that in connection with step a) of providing a source document in a computer readable format, e.g. a pdf-format, the source document is e.g. searched and loaded 50 from a source such as a database DB1.

    [0100] Subsequent to this, the work related with the searching and selecting 52 of items in the source document and step c) of establishing an identifying data set to identify the one or more items that has/have been selected 54 involves a database DB2, e.g. a database in connection with the extract application program, in which database DB2 identifying data set by means of which said the one or more items that has/have been selected, is stored together with a source document identification. The identifying data set may be established in various manners or forms, e.g. an item may be identified by a page number in the source document and coordinates on the page, etc. The name of the source document may also be part of the identifying data set or sets e.g. together with the size of the source document to further ensure a safe identification of the correct source document by comparison of size.

    [0101] Thus, the source document remains unamended, i.e. due to the selected items being identified by an identifying data set, which is separate from the source document as provided from and stored in the database DB1. Further, in this way it is made possible that the work can be interrupted and resumed later, e.g. by reloading the source document from DB1 and the separately stored identifying data set for the items already selected from DB2.

    [0102] Finally, it is shown in FIG. 4 that the step d) of validating the selected items at 56 and the step e) of performing the extraction on the document at 58 is made in interaction with a further database DB3, e.g. a database related to the extract application program, wherein the extract document is stored.

    [0103] The extract document may be automatically renamed when it is stored in a database, e.g. DB3. The renaming may be performed e.g. by adding a letter to the name of the source document such as X-name.pdf or by changing the name of the source document entirely for example with a file name generator. A person performing the extraction of the source document may also manually rename the extract document when storing it in a database.

    [0104] The databases DB2 and DB3 may be located on separate data storage devices in the same place or in different places with data links between the devices or may be located on one data storage device in different storage areas of the device.

    [0105] In the above description, various embodiments of the invention have been described with reference to the drawings, but it is apparent for a person skilled within the art that the invention can be carried out in an infinite number of ways, using e.g. the examples disclosed in the description in various combinations, and within a wide range of variations within the scope of the appended claims.

    LIST OF REFERENCE NUMBERS

    [0106] 1 Source document is loaded

    [0107] 2 Searching and focusing

    [0108] 3 Reviewing and selecting

    [0109] 4 Other manners of selecting

    [0110] 5 Validating

    [0111] 6 Generating extract document by irreversible conversion

    [0112] 7 Converting into a portable document format

    [0113] 8 Save functionality

    [0114] 9 Return loop

    [0115] 10 Return loop from validation step

    [0116] 20 Editor at extract application program

    [0117] 22 Key for opening a source document

    [0118] 24 Key for saving the work performed

    [0119] 26 Key for selecting an item

    [0120] 28 Key for performing an extraction on the document

    [0121] 32a Source document

    [0122] 32b Source document in selected version

    [0123] 40 Viewer at extract application program

    [0124] 42 Extract document shown in viewer

    [0125] 44 Selected and validated items masked/replaced with black boxes

    [0126] 50 Providing source documentstep a)

    [0127] 52 Selecting items in documentstep b)

    [0128] 54 Establishing data set to identify selected itemsstep c)

    [0129] 56 Validating selected itemsstep d)

    [0130] 58 Performing extraction on documentstep e)