IMAGE PROCESSING AND MACHINE LEARNING-BASED EXTRACTION METHOD
20230073775 ยท 2023-03-09
Inventors
- Nathalie Goldstein (San Jose, CA, US)
- Joachim Niederreiter (Vienna, AT)
- DeGuang Sea (Maria Lanzendorf, AT)
- Markus Finster (Vienna, AT)
Cpc classification
G06V30/19013
PHYSICS
G06V30/1463
PHYSICS
G06F16/58
PHYSICS
G06V30/414
PHYSICS
International classification
Abstract
A system for file image processing and extraction of content from images is provided. The system comprises a computer and an application. When executed on the computer, the application receives a source document containing areas of interest and normalizes the document to align with a stored template image. The application also applies metadata associated with the template image to the areas of interest to identify data fields in the normalized document and extracts data from the identified data fields. The application also processes the extracted data using at least character recognition systems and produces a static structure using at least the identified data fields, the fields containing the processed data. The areas of interest comprise portions of the source document containing text needed to create and populate fields suggested by the stored template image. Normalizing the source document comprises at least one of flipping, rotating, expanding, and shrinking the document.
Claims
1. A system for file image processing and extraction of content from images, comprising: a computer; and an application executing on the computer that: receives a source document containing areas of interest, normalizes the document to align with a stored template image, applies metadata associated with the template image to the areas of interest to identify data fields in the normalized document, extracts data from the identified data fields, processes the extracted data using at least character recognition systems, and produces a static structure using at least the identified data fields, the fields containing the processed data.
2. The system of claim 1, wherein the areas of interest comprise portions of the source document containing text needed to create and populate fields suggested by the stored template image.
3. The system of claim 1, wherein normalizing the source document comprises at least one of flipping, rotating, expanding, and shrinking the document.
4. The system of claim 1, wherein the metadata identifies the data fields at least partially aligning with fields suggested by the template image.
5. The system of claim 1, wherein the static structure is constructed to align with the template image.
6. The system of claim 1, wherein the static structure is used to create a stored record based at least partially on the template image.
7. The system of claim 1, wherein the template image suggests the static structure and mandates at least some data fields needed by the stored record.
8. The system of claim 1, wherein the source document is image-based and contains graphics, the graphics containing at least some of the data fields.
9. The system of claim 1, wherein the metadata preserves structure lost during use of character recognition systems.
10. A method of adapting material from an unfamiliar document format to a known format, comprising: a computer determining features in a source document and a template that at least partially match; the computer applying a normalization algorithm to the source document; the computer applying metadata to features in the source document to identify data fields at least similar to data fields in the template; the computer extracting data from the identified fields using at least optical character recognition tools; and the computer producing a static structure containing the identified data fields and data within the fields, the structure at least partially matching structure of the template.
11. The method of claim 10, wherein normalizing the source document further comprises rotation, scaling, skewing and general positioning correction of the source document.
12. The method of claim 10, wherein the template is used with the metadata and at least one feature detection algorithm to normalize the source document to an orientation and size of a reference image suggested by the template.
13. The method of claim 10, wherein the metadata suggests the location of material to be extracted from the source document.
14. The method of claim 10, wherein the source document is image-based and contains graphics, the graphics containing at least some of the data fields.
15. A system for file image processing and extraction of content from images, comprising: a computer; and an application executing on the computer that: determines that a format of a received document does not conform to a template used for storage of data of a type contained in the received document, normalizes the received document to at least support readability and facilitate identification of fields and data contained within the fields, applies metadata and machine image processing algorithms to identify fields in the source document at least partially matching fields in the template, employs optical character recognition and machine learning techniques that promote semantically accurate data extraction to extract data from the identified fields, and builds a static structure based on the identified fields and extracted data to at least partially conform to the template.
16. The system of claim 15, wherein the metadata identifies fields at least partially aligning with fields suggested by the template image.
17. The system of claim 15, wherein the received document contains graphics and non-textual content.
18. The system of claim 15, wherein the static structure is used to create a stored record based at least partially on the template image.
19. The system of claim 15, wherein the template image suggests the static structure and mandates at least some data fields needed by the stored record.
20. The system of claim 15, wherein normalizing the received document comprises at least one of flipping, rotating, expanding, and shrinking the received document.
Description
BRIEF DESCRIPTION OF THE FIGURES
[0027]
[0028]
[0029]
DETAILED DESCRIPTION
[0030] Systems and method described herein provide for addressing the problems described above regarding handling a document with unrecognizable elements or content that may otherwise be difficult to decipher. An application provided herein may be activated when an unfamiliar and at least partially unrecognizable document is received in a transaction or other interaction. The document may be unacceptable for any of the reasons described above. The document may not conform adequately to a template such that contents of the document comprising at least fields and contained data cannot be properly extracted. As noted, the document may contain graphics, images, and non-text content that obscures text or otherwise renders the document unreadable.
[0031] When a document of such unknown format and content is received, the application first determines if the document contains items of interest. Such items may comprise specific word, numbers, symbols, or other indicia or markings suggesting the document was sent as part of a transaction or other official communication.
[0032] If the document does in fact contain areas of interest, the system may then normalize the document which consists of aligning the document to make it readable. Such normalization may comprise at least one of flipping, rotating, expanding, and shrinking the document. After normalization, processing and extraction steps take place as described below.
[0033] Techniques and processes for enhancing and normalize images of documents with a known structure and therefore known semantic (static structured documents) are provided herein. Systems and methods extract information out of an image using artificial intelligence (optical character recognition, OCR), a template image and meta data associated to the template image.
[0034] The present disclosure provides for extraction of material from image-based files, for example jpeg and pdf formatted files, and including files that are uploaded into a software application. An intention is to recognize material in the file, extract relevant information, and store the data for future use. By uniquely combining imaging methods, contextual information and OCR, the disadvantages of previous implementations may be mitigated.
[0035] Systems and methods are provided for optimizing image quality in regard to positional constraints. An extraction phase may therefore yield better results. The process described herein may be grouped into three phases: Normalization, Processing and Extraction.
[0036] For inputs, systems and methods require images to be in electronically processable formats. Examples of such formats comprise bmp, bpm, pgm, ppm, sr, ras, jpeg, jpg, jpe, jp2, tiff, tif and png.
[0037] The provisioning of an input image is dependent on the development of the software but can contain:
[0038] The transmission over a physical or logical network and supplementation to an API.
[0039] Stored on a persistent storage which is directly or indirectly accessible by the software.
[0040] Stored locally on a persistent storage which is directly or indirectly accessible by the software.
[0041] Another prerequisite for systems and methods provided herein to execute the normalization phase is to supply a template image. The template image acts as a reference image to normalize the input image against which is done in multiple steps. A purpose of the steps in this phase is to enhance the image for follow-up phases. Steps can be executed sequentially or in parallel where it is semantically possible.
[0042] Steps may be independently scaled to each other. Main steps provided herein are positional correction steps for correcting rotation, scaling and skewing. The aforementioned steps are leveraging template images in combination of feature detection algorithms (e.g.: SIFT, SURF, ORB), a scoring function and an evaluation function to normalize the input image to the template image.
[0043] The execution of a step may be explained as follows:
[0044] Executing the feature detection function with the input image of the step and the template image to find features present in both images. A user may take the top nth detected matching features and conduct a search on each set of combination of the features. A matching feature consists of coordinates of a point in the input image and coordinates of a point in the template image. A combination comprises of a pair of features.
[0045] Each combination is associated with a provided scoring function.
[0046] Each combination is evaluated with the provided an evaluation function to find the most suitable combination in the search space.
[0047] Follow up phase is the processing phase which has the prerequisite of meta data which capture the semantic of the data to extract that has to be supplied in a machine processable form. The data may comprise positional information of the area where specific information is to be expected and additional information for step internal use.
[0048] The processing phase in combination with the supplied metadata and a search function is being used to identify the area of text which will be used as input for the next phase to extract the information by OCR over the target area. This extraction phase is leveraging machine learning by using language agnostic models for recognition.
Operation
[0049] For a structured document to be processed, a base template image is to be provided which in case of fillable forms can be an empty form.
[0050] Additionally, areas of interest which contain text to be extracted are to be identified and a metadata file associated with the template is to be accessed or created.
[0051] A machine which satisfies the platform constraints of the software needs to be provided. This may depend on the actual implementation of the software.
[0052] Dependencies, libraries and additional third-party tools may need to be installed and configured. This may include the trained model of the machine learning algorithm of the language of the data that is to be extracted.
[0053] Setup of software that implements systems and methods provided herein.
[0054] The usage of a reference image herein finds the same structure of a form in the input image. Such usage may identify a region of interest and therefor implicitly identify image correction parameters which may be a significant issue for OCR.
[0055] The present disclosure may not use the reference image to determine polygons or area in the input image but instead may derive image correction parameters such as skewing or rotation as the form structure.
[0056] The present disclosure does not spatially analyze the input image or parts of it but instead performs an image-wide feature detection. Also, classification of unclassified features and establishment of a semantic connection to the spatial template is not necessary of a known form structure.
[0057] The present disclosure may not rely on forms with optical grid structure and aligning input and template image to derive form structure and semantic from creating a connection. Systems and methods provided herein may be independent of any geometrical structure to be able to identify and create semantic context.
[0058] Turning to the figures,
[0059] The system 100 also comprises template images 106a-n and metadata 108a-n incorporated into each template image 106a-n, respectively, and optical character recognition system 110. While the template images 106a-n, their respective metadata 108a-n and the optical character recognition system 110 are depicted as stored in the server 102, in embodiments these components may be stored elsewhere.
[0060] The system 100 also comprises source documents 112a-n, a database 114, and stored records 116a-n. The database 114 is an optional component as the stored records 116a-n may not be database records and may be stored elsewhere.
[0061] The server 102 may be more than one physical computer that may be situated at more than one geographic location. The application 104 executes on the server 102 and provides much of the functionality described herein. The application 104 may execute on more than one physical computer.
[0062] The source documents 112a-n are documents received for processing that may contain graphics, images, or other content that render these items not possible to process using systems and methods provided by previous implementations. The source documents 112a-n may also be not possible to process as initially received by the server 102 because they were transmitted by the customer or other party in a flipped, mirror image, or rotated state and thus require normalization as described herein.
[0063] The stored records 116a-n may be a desired end result of systems and methods provided herein. When the application 104 performs the processed described above of normalization processing, and extraction on a source document 112a and conforms it acceptably to the template 106a such that a static structure may be established, a stored record 116a representing the source document 112a may be created and stored in the database 114 or elsewhere.
[0064]
[0065]
[0066] In an embodiment, a system for file image processing and extraction of content from images is provided. The system comprises a computer and an application. When executed on the computer, the application receives a source document containing areas of interest and normalizes the document to align with a stored template image. The application also applies metadata associated with the template image to the areas of interest to identify data fields in the normalized document and extracts data from the identified data fields. The application also processes the extracted data using at least character recognition systems and produces a static structure using at least the identified data fields, the fields containing the processed data.
[0067] The areas of interest comprise portions of the source document containing text needed to create and populate fields suggested by the stored template image. Normalizing the source document comprises at least one of flipping, rotating, expanding, and shrinking the document. The metadata identifies the data fields at least partially aligning with fields suggested by the template image. The static structure is constructed to align with the template image. The static structure is used to create a stored record based at least partially on the template image. The template image suggests the static structure and mandates at least some data fields needed by the stored record. The source document is image-based and contains graphics, the graphics containing at least some of the data fields. The metadata preserves structure lost during use of character recognition systems.
[0068] In another embodiment, a method of adapting material from an unfamiliar document format to a known format is provided. The method comprises a computer determining features in a source document and a template that at least partially match. The method also comprises the computer applying a normalization algorithm to the source document. The method also comprises the computer applying metadata to features in the source document to identify data fields at least similar to data fields in the template. The method also comprises the computer extracting data from the identified fields using at least optical character recognition tools. The method also comprises the computer producing a static structure containing the identified data fields and data within the fields, the structure at least partially matching structure of the template. Normalizing the source document further comprises rotation, scaling, skewing and general positioning correction of the source document. The template is used with the metadata and at least one feature detection algorithm to normalize the source document to an orientation and size of a reference image suggested by the template. The metadata suggests the location of material to be extracted from the source document. The source document is image-based and contains graphics, the graphics containing at least some of the data fields.
[0069] In yet another embodiment, a system for file image processing and extraction of content from images is provided. The system comprises a computer and an application executing on the computer that determines that a format of a received document does not conform to a template used for storage of data of a type contained in the received document. The system also normalizes the received document to at least support readability and facilitate identification of fields and data contained within the fields. The system also applies metadata and machine image processing algorithms to identify fields in the source document at least partially matching fields in the template. The system also employs optical character recognition and machine learning techniques that promote semantically accurate data extraction to extract data from the identified fields. The system also builds a static structure based on the identified fields and extracted data to at least partially conform to the template. The metadata identifies fields at least partially aligning with fields suggested by the template image. The received document contains graphics and non-textual content. The static structure is used to create a stored record based at least partially on the template image. The template image suggests the static structure and mandates at least some data fields needed by the stored record. Normalizing the received document comprises at least one of flipping, rotating, expanding, and shrinking the received document.
[0070] It will be readily understood that the components, as generally described herein and illustrated in the figures included, may be arranged and designed in different configurations. Therefore, the description herein of the embodiments of systems and methods as represented at least in the included figures, is not intended to limit the scope of the application as claimed but is merely representative of selected embodiments.