Method for marking visuals of information for subsequent identification or authentication
11606475 · 2023-03-14
Assignee
Inventors
Cpc classification
G06T2201/0062
PHYSICS
G06T2201/0061
PHYSICS
G06V10/751
PHYSICS
G06T1/0028
PHYSICS
G06T2201/0081
PHYSICS
G06T1/005
PHYSICS
H04N1/32229
ELECTRICITY
G06F21/64
PHYSICS
International classification
H04N1/32
ELECTRICITY
Abstract
A method for authenticating digital information includes obtaining, in digital form, information for authentication; preparing the information for processing, such preparation including converting the information into a digital image; identifying segments of content in the digital image; grouping the segments of content into one or more segment groups; generating a marking sequence comprising shifting at least one of the one or more segment groups in one or more directions; and applying the marking sequence to the digital image, creating a unique marked copy of the digital image.
Claims
1. A method for authenticating digital information comprising the steps of: obtaining a digital image file; obtaining object grouping information from one or more operators from the image file, the object grouping information defining one or more object groups; generating a marking sequence comprising shifting at least one of the one or more object groups in one or more directions; and applying the marking sequence to the digital image, creating a unique marked copy of the digital image.
2. The method of claim 1, further comprising the steps of: preparing the digital information for processing, such preparation comprising rendering a digital image; identifying segments of content in the digital image; and grouping the segments of content into the one or more object groups.
3. The method of claim 2, wherein the marking sequence further comprises altering a spacing between two or more of the segments in at least one of the one or more segment groups.
4. The method of claim 1, wherein at least one of the marking sequence, the unique marked copy, and the digital image file is stored in digital form in a database.
5. The method of claim 1, further comprising the steps of: obtaining a digital image for comparison; employing an authentication process to authenticate the digital image for comparison by comparing it to the unique marked copy; and generating an authenticity conclusion report identifying the digital image for comparison as authentic if the authentication process determines that it matches the unique marked copy and identifying it as inauthentic if it does not.
6. The method of claim 5, wherein the authentication process comprises digitally overlaying the digital image for comparison with the unique marked copy.
7. A method for authenticating a PDF comprising the steps of: obtaining a PDF file; obtaining content grouping information from one or more operators from the PDF file, the content grouping information defining one or more object groups; generating a marking sequence comprising shifting at least one of the one or more object groups in one or more directions; and applying the marking sequence to the PDF file, creating a unique marked copy of the PDF file.
8. The method of claim 7, wherein at least one of the marking sequence, the unique marked copy, and the PDF file is stored in digital form in a database.
9. The method of claim 7, further comprising the steps of: obtaining a digital image for comparison; employing an authentication process to authenticate the digital image for comparison by comparing it to the unique marked copy; and generating an authenticity conclusion report identifying the digital image for comparison as authentic if the authentication process determines that it matches the unique marked copy and identifying it as inauthentic if it does not.
10. The method of claim 9, wherein the authentication process comprises digitally overlaying the digital image for comparison with the unique marked copy.
11. A system for authenticating a digital file comprising: an authentication marking module for generating and applying a marking sequence to the digital file to create a unique marked copy; wherein the authentication marking module obtains content grouping information from one or more operators from the digital file, the content grouping information defining one or more object groups, and generates the marking sequence to shift at least one of the one or more object groups in one or more directions, creating the unique marked copy of the digital image.
12. The system of claim 11, further comprising a preliminary analysis module for obtaining and processing the digital file, the preliminary analysis module operating to obtain the content grouping information from the one or more operators from the digital file and to obtain two or more segments of content within at least one of the one or more object groups.
13. The system of claim 12, wherein the marking sequence also alters a spacing between the two or more of the segments of content within at least one of the one or more object groups.
14. The system of claim 11 further comprising a comparative analysis module for authenticating an image file for comparison, the comparative analysis module obtaining the image file for comparison, initiating an authentication process comprising comparing the image file for comparison with the unique marked copy, and generating an authenticity conclusion report identifying the image file for comparison as authentic if the authentication process determines that it matches the unique marked copy and identifying it as inauthentic if it does not.
15. The system of claim 14, wherein the authentication process comprises digitally overlaying the digital image for comparison with the unique marked copy.
16. The system of claim 14 further comprising at least one database, wherein at least one of the marking sequence, the unique marked copy, the digital file, the image file for comparison, and the authenticity conclusion report is stored in the at least one database.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The patent or application file contains at least one drawing executed in color. Copies of this parent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DETAILED DESCRIPTION OF THE INVENTION
(15) The essence of the given method is illustrated by figures, where:
(16) The given method includes the following stages:
(17) Stage 1. Preliminary analysis (analysis and structuring) of the marked information (document, drawing, scheme, etc.)
(18) Stage 2. Marking of the image and the formation of a unique copy (which has unique geometric characteristics of graphic objects), with the registration of marking parameters and information about the recipient of the image (subject) in the journal.
(19) Stage 3. Digital image analysis of a fragment of a marked copy (for example: photograph of the image on the screen or in printed form, scanned copy of the image in printed form, screen shot (screenshot) of the computer) and comparison with the registered marked images with the identification of the corresponding journal log entry.
(20) The present method is carried out using the novel computer software “LeaksID”. The software is one of the method realizations, but there can be other ways for realization. The software provides the ability to download (transfer to the program) the user information in digital form (image, document, drawing, diagram, presentation); users receive marked (uniquely) copies of previously downloaded information; download for analysis of a digital image of information in full form or fragment (photographs of a screen or a printed copy, scan-image of a printed copy, etc.); identification marking on the analyzed image and the identification of the corresponding journal log entry.
(21) Despite the fact that the software represents the implementation of the given method in a certain form (in accordance with a specific set of use cases), the method does not exclude and suggests the possibility of a different implementation, also in the form of one or more computer programs, within which the claimed marking method, but use cases will be different (for example: marking of images will occur in a hidden (implicit) way for the user, without his conscious participation, thereby achieving an additional level of security).
(22) Description of the method steps/stages:
(23) Stage 1. Preliminary Information Analysis
(24) At the first Stage, the marked information in digital form is transmitted to the software for analysis and formation of the structure of its visualization. At the same time, the original unmarked information (the original) is stored in a protected storage area, in a format suitable for optimal marking speed (for example, a multi-page scan-image of a document may be divided into separate pages), but not necessarily.
(25) Marked information means any information that can in one way or another be displayed both on a screen and displayed on a printing device (For example: raster or vector format image, document, diagram, chart, drawing, etc.)
(26) The information visualization structure formation is as follows:
(27) Step 1. A visual representation of the original (visual image) in any form (bitmap, raster image, vector image, etc.) is formed in the computer memory. If the information in the visual display consists of several pages, each page is processed in the same way.
(28) Step 2. The visual image is divided into independent related components. For example, for a regular text document, this could be letters, words, signs, lines, etc. In the framework of the given method, an independent component is considered to be, in a general sense, the area of a visual image that can be moved relative to other “unrelated” components without losing the meaning of information or obvious visual defects. Any method can be used to identify components (for example: OCR—Optic Character Recognition). Additional steps for a visual image processing can be performed during this step in order to increase quality of component definition (for example: image binarization performed by any suitable method).
(29) Each component is provided with data for enabling subsequent restoration of its coordinates on a visual image (for example: for raster images, this may be a pair of coordinates (x1, y1; x2, y2) of diagonally opposite angles of the framing rectangle, or coordinates (x, y) of one angle and characteristics of the width and height of the framing rectangle, etc.).
(30) Step 3. Forming of the structure of groups of dependent components. During this Step, all components obtained in Step 2 are connected into dependent structures. The dependent structure in the framework of the present method means a set of components that must be moved along a particular axis of the image in a single direction and a single distance (for example: all characters in one line of the document can only be moved vertically together; or the first words of all lines of a paragraph except the first, can be moved horizontally only together).
(31) The grouping of dependent components can occur in various ways and methods (for example, for the black-and-white image of a document, you can use the graphical spectrum method described in O'Gorman, L. “The document spectrum for page layout analysis” (Volume: 15, Issue: 11) or any other suitable method. In such a way, as a result of this step, a structured representation of the geometric characteristics of the information visual image is obtained, which is stored in the computer memory and is associated with the original (loaded earlier in Step 1).
(32) Stage 2. Marking the Image and Formation of a Unique Copy
(33) Marking of the image takes place in the background (preliminary) or upon request (either from a user and from other programs, in the case of embedding, or both).
(34) At the marking Stage, on the basis of a structured representation of the geometric characteristics of the original image (obtained in Step 1), the original visual image is modified by displacing groups and components along different axes, but preserving the dependencies (determined at the 3rd Step of the 1st Stage).
(35) Examples:
(36) For a scanned image of a plain text document: line i is shifted up by A pixels, line j is shifted down by B pixels, word k in line i is shifted to the right by C pixels, word n in line j is shifted by D pixels, etc.
(37) For a diagram or chart: all blocks of the i-th group are shifted to the right by X points and down by Y points.
(38) For a word document: for paragraph i, increase the line spacing by 0.1 points, j the gap of paragraph k, increase by 50% in proportions, add s spaces before the t-th word in m paragraph, etc.
(39) Additional Examples:
(40) TABLE-US-00001 1) For a scanned image of a regular text document: line L1 is shifted up by Y1 pixels, line L2 is shifted down by Y2 pixels, word W1 in line L3 is shifted to the right by X1 pixels, word W2 in the line L4 is shifted to the left by X2 pixels, etc. 2) For a chart or diagram: all blocks of the group G1 are shifted to the right by X5 points and down by Y5 points. 3) or a word document: for paragraph P1, line spacing should be increased by 0.1 points, S1 space of paragraph P2 should be increased in proportions by 50%, before the W1 word in P3 paragraph, add S2 spaces, etc.
(41) The present method does not limit methods of changing the geometric characteristics of the visual image. The above examples are intended only to clarify the purpose and method of marking.
(42) Thus, by making certain changes in the visual image of the original, a copy is produced that has different geometric characteristics from the original.
(43) To achieve uniqueness, before making changes in the framework of the implemented program, a pseudo-random sequence of offsets (direction and distance) is generated, which is subsequently applied to the visual image of the original and allows one to get a unique random copy. At the same time, the parameters of the displacements, depending on the settings and the field of application, are selected so that the person without the use of special technical means and/or without a long analysis will not be able to determine what changes were made to the visual image of the original.
(44) In such a way, a unique copy of the visual image is produced, without making any additional characters or marks (as opposed to other methods) that a person could discover and omit (for example, cut out).
(45) The use of original transformations based on a specific sequence (not random) allows one to add additional “implicit” information to the original, a so-called shorthand method (for example: coding of bit information through single and double spaces in the document, or single and one and a half line spacing, etc.). In this case, such additional information can be read from the document by a machine. Based on this “additional” information, it is possible to authenticate documents without even having access to a complete document, but with only a portion of the document.
(46) The amount of possible information is directly proportional to the number of blocks on the visual image of the original.
(47) Within the framework of the given program, the sequence on the basis of which the original was marked (by displacing blocks and groups) is stored in a protected storage area and is uniquely associated with the transaction log, in which facts, time and subject that requested the marked image of information are recorded.
(48) Stage 3. Digital Image Analysis of a Fragment of a Marked Copy
(49) To analyze and identify a marked copy, the program that implements the present method loads the analyzed visual image, i.e. “sample” (for example, a screen picture or a scan image created from a paper document).
(50) Then a process of image structuring takes place (similar to the Step 2 in the Stage 1).
(51) The original information previously downloaded into the program is then selected (the selection can be made by the user or the program itself on the basis of various methods, for example, using full-text search).
(52) Next, a process of “calibration”, i.e. comparisons of various components between the analyzed sample and the original previously loaded into the program, takes place. This process can occur automatically or with the help of a user.
(53) Then, all previously created marked copies are reproduced, based on the log and sequences stored at the Stage 2, and are compared with the loaded sample.
(54) Comparison can take place in various ways, including by overlaying a sample on a previously reproduced copy, in which case the degree of similarity can be calculated by counting overlapping parts of the image (for example: the number of matched pixels for bitmaps).
(55) Before performing the comparison, the sample and the recreated copy can be processed in various ways (for example, binarized).
(56) As a result of comparing the sample with all previously recreated copies and calculating the degree of similarity, it is possible to count and sort the obtained results to identify the corresponding analyzed sample of the marked copy and the associated log entry.
(57) After the process of structuring an image of a sample, by simple mathematical and statistical operations, it is possible to obtain with a certain degree of error additional information, which was implicitly encoded by steganography into the original visual image.
(58) Obtaining information directly from the sample allows to apply this information both to identify the marked copy and for authentication.
(59) Referring now to the figures,
(60) The system preferably stores the original data or information 12, the unmarked digital image 14, the one or more marking sequences 16, and the one or more unique marked copies 18 in a database or similar digital data storage structure and preferably associates the stored information with identifying information about the user that submitted the original data or information 12 and the circumstances surrounding that submission, such as a date/time stamp, method of submission, etc. The present invention may then compare a subsequently obtained digital image with the stored marking sequences 16 using an authentication process 20 to determine if the subsequently obtained digital image is an authentic copy and, preferably, identify the user that initiated the matching marking sequence 16 to determine if the subsequently obtained digital image is an authorized copy, if the source of the subsequently obtained digital image is associated with said user, etc.
(61) Preferable embodiments of the authentication process 20 include overlaying the subsequently obtained digital image using the one or more unique marked copies 18. The authentication process 20 may be employed using only a fragment of the subsequently obtained digital image or may compare the entire subsequently obtained digital image to only a fragment of the one or more unique marked copies 18. Those of ordinary skill in the art will recognize the various available means of authentication of the subsequently obtained digital image.
(62)
(63)
(64)
(65)
(66) Referring now to
(67) Referring next to
(68)
(69) Upon receipt, the preliminary analysis module 42 determines if the original data or information 12 is in an appropriate digital image 14 format. If not, the preliminary analysis module converts the original data or information 12 into an appropriate digital image format 14. The preliminary analysis module 42 then preferably stores the original data or information 12, the original digital image 14, and/or the information relating to the user and the user's submission in a database 34 for later reference and creates a copy of the original digital image 14 for processing.
(70) The system 40 next analyzes the digital image 14 copy and identifies the various segments or fragments 36 found therein. This step may be performed by either the preliminary analysis module 42, the authentication marking module 44, or the two in combination. For example, the preliminary analysis module 42 may perform an optical character recognition (OCR) process on the digital image 14 copy, and the authentication marking module 44 may then isolate and identify the various segments or fragments 36 in the document. Or, either the preliminary analysis module 42 or the authentication marking module 44 may do both.
(71) In any case, once the segments or fragments 36 are isolated and identified, the authentication marking module 44 groups them into one or more segment groups 38. Some preferable embodiments may base such groupings on, for example, the nature of the individual segments 36 (e.g. text, tables, shapes, other objects), location within the document (e.g. segments 36 of text on a single line are grouped together, etc.), and/or mobility relative to other segments or fragments 36. Those of ordinary skill in the art will recognize the various approaches to creating the segment groups 38.
(72) With the segment groups 38 established, the authentication marking module 44 can then create the marking sequence 16 and apply it to the digital image 14 copy, creating a unique marked copy 18 of the original digital image 14. Both the marking sequence 16 and the unique marked copy 18 are preferably stored in the database 34. In the case of a user-initiated application of the system 40 in which data about the user and the user's request was recorded and stored, the authentication marking module 44 preferably stores the marking sequence 16 and the unique marked copy 18 in association with said user data. The unique marked copy 18 is then also provided to the user who initiated the request for his or her use.
(73) To authenticate a digital image for comparison 48, the system 40 preferably employs a comparative analysis module 46, though the steps performed by the comparative analysis module 46 could also or alternatively be performed by the authentication marking module 44. The comparative analysis module 46 first obtains the image for comparison 48. The image 48 may be obtained via user submission, using an internet scraping process, or through other electronic means, as will be understood to those of ordinary skill in the art.
(74) The comparative analysis module 46 next preferably performs the authentication process 20. Preferable embodiments authenticate the image for comparison 48 by overlaying it with the one or more unique marked copies 18 previously created by the authentication marking module 44 and stored in the database 34. The comparative analysis module 46 may also perform the authentication process 20 using the original digital image 14 as well, where applicable. Upon identifying a unique marked copy 18 matching the image for comparison 48, preferable embodiments of the comparative analysis module 46 generate an authenticity conclusion report 50 confirming the image for comparison's 48 authenticity. If no match exists, the authenticity conclusion report 50 identifies the image for comparison 48 as an unauthorized copy.
(75) The authenticity conclusion report 50 is preferably provided to the user in the case of a user-initiated authentication process 20 and is also preferably stored in the database 34 for later reference. The authenticity conclusion report 50 preferably includes information on the unique marked copies 18 used in the authentication process 20 and which, if any, matched the image for comparison 48. The authenticity conclusion report 50 may also reflect any other data associated with any matching unique marked copy 18, such as the user data and user request data associated with such unique marked copy 18, where applicable.
(76) In preferable embodiments, upon identifying an unauthorized image for comparison 48, the system 40 may take further action to attempt to remove the unauthorized image 48, such as initiating a take-down request on a third-party website, taking steps to initiate legal proceedings, etc. Those of ordinary skill in the art will recognize the steps that may be automated to protect the authenticity of the digital images 14 submitted to the system 40 for protection. As will be understood by those of ordinary skill in the art, the system may be implemented locally using local installation software or implemented over a network connecting a local device with a remote server.
(77) Referring last to
(78) The exemplary system 60 preferably employs a preprocessing service 70 that can identify and convert, if necessary, any information provided by the user into a digital image file. The preprocessing service 70 may also perform other document processing functions, such as OCR (optical character recognition), etc., and/or a document service 72 may perform such document processing functions. The document service 72 also preferably implements functions to manage the other services of the system and preferably manages registration and storage of information about the digital image file using a service registry 74, a file service 76, and a file storage 78. The file service 76 also preferably provides unified file storage functionality for the system 60, and the service registry 74 preferably implements “service discovery” functionality.
(79) A marking service 80 next processes the digital image provided by the preprocessing service 70, identifying the various segments and/or segment groups found in the digital image file. The marking service 80 then marks the digital image file, as discussed above, creating the unique marked copy, and preferably stores the unique marked copy and the marking sequence in a database 82. A configuration service 84 may assist in developing the marking sequence and preferably provides configuration parameters to the other services employed by the system 60.
(80) An analysis service 86 operates to authenticate an item submitted from the user device 62. As discussed above, the analysis service 86 preferably queries database 82 to obtain digital image data for comparison and performs the authentication process, preferably by overlaying the item submitted for authentication with the digital image data obtained from the database 82. The system's front-end service 68 preferably also provides a message service 88 permitting the system's 60 components to interact with each other in an asynchronous way and preferably also facilitates interaction with the user devices 62 connected via the network 64. Such message service 88 may, for example, provide the authenticity conclusion report created by the analysis service 86 to a user seeking an authenticity confirmation on the item submitted for authentication. As those of ordinary skill will appreciate, the components of system 60 may perform these and other functions to implement the full extent of functionality of the invented method and system.
(81) The Special Case of Image Files
(82) The present invention may process certain non-raster format digital documents, such as PDFs, MS Word files, MS PowerPoint files, etc., somewhat differently from the processing of raster-format digital documents in some preferable embodiments. For example, PDF files are often embedded with character recognition and object grouping functions and operators, which allow the present invention's authentication system and process to mark such files without undertaking the character recognition and object grouping steps, speeding up the authentication process substantially. Visually, the marking of non-raster image files and raster files creates the same output. But from a resource-utilization and processing-speed perspective, the authentication process can be up to ten times faster for non-raster image files, which is highly desirable particularly when processing large amounts of data, as will be recognized by those of skill in the art.
(83)
(84) When outputting the text of the PDF, the framing operators BT and ET are used within which new output parameters can be set, through the operators Tm (text transformation matrix). Thus, all operators between the encountered operators Tm also form a group and can be labeled by changing the parameters of the operator Tm. Further, all the text is output through the operators Tj and TJ, which output specific text lines. Text lines represent nested groups and are preferably marked using the present invention's method by moving the drawing start point through the Td statement.
(85) Within string values of text, there are non-printable characters (for example, the space character), as well as punctuation marks and special characters (for example, brackets, quotes, or dashes), commonly referred to as delimiters. All other characters between the separator characters are nested groups (in most cases, this is a simple word). Preferable embodiments of the present invention may move individual words in a PDF by splitting one line-output operator (Tj operator) into an array-of-string operator (TJ operator), which allows for the increase and decrease of the horizontal offset between words.
(86) Each of these operations is used in preferable embodiments of the present invention to mark PDFs and other applicable non-raster image files. By utilizing these operators inherent in the native files, the present invention can omit the steps of the authentication process associated with rendering an image, identifying the individual elements therein, and segmenting and grouping of those individual elements. By omitting these steps and using the grouping inherently present in the non-raster image file itself, preferable embodiments of the present invention can skip directly to generating the sequence of modifications and applying those modifications to the non-raster image file at the file level, substantially improving processing speeds for large amounts of data.
(87)
(88) By omitting pre-marking steps 22, 24, and 26 and replacing them with the simple step of extracting every operator 25 of the image file, and by applying the markings at the PDF level by reassembling the operators to the document 32, the present invention can drastically improve processing speeds and resource allocation, as will be understood to those of skill in the art. From a visual perspective, a document marked using the exemplary process shown in
(89) Processing time and resource efficiency is further improved by omitting the step of persisting the segments and groups in the database 34 as well. When it comes time to authenticate a marked PDF or other non-raster image file that was marked using the more streamlined method described above, the same comparison operation is employed but without the added burden of accessing and processing the segmentation and grouping steps applied and persisted in the database (22, 24, 26, 34) as is necessary for raster-format files.
(90)
(91) To authenticate an image for comparison 48, the system 40 preferably employs a comparative analysis module 46 to assess the image's 48 appearance based upon the marking sequences 16 stored in the database 34, performing the authentication process 20 and generating an authenticity conclusion report 50. If the image for comparison 48 matches one of the saved marking sequences 16 from the database 34, the system 40 generates an authenticity conclusion report 50 determining that the file is authentic. If no match is found, the authenticity conclusion report 50 would identify the image for comparison 48 as an unauthorized copy. This entire process is performed far more rapidly with respect to PDF and other non-raster image files by omitting the steps needed for pre-processing raster-format files, including rendering the image 22, defining image segments 24, defining segment groups 26, and persisting the segments and groups in the database 34.
(92) While the present invention has been described with reference to particular embodiments and arrangements of components, features, and the like, it is not limited to those embodiments or arrangements. Modifications and variations will be ascertainable to those of skill in the art, all of which are inferentially and inherently included in these teachings.