METHOD AND APPARATUS TO LOCATE FIELD LABELS ON FORMS
20240005687 ยท 2024-01-04
Inventors
Cpc classification
G06V30/18086
PHYSICS
International classification
G06V30/412
PHYSICS
Abstract
Method and apparatus to identify field labels from filled forms using image processing to compare two or a few copies of the same kind of form, possibly filled out differently. Filled forms are processed to provide text strings, one for each line of text on each filled form. The text strings are converted to vectors. Vectors from different filled forms are compared to identify common words which are indicative of field labels. In an embodiment, a histogram may be generated to show frequency of occurrence of characters and words, the histogram values also being indicative of field labels.
Claims
1. A computer-implemented method comprising: a. responsive to receiving versions of a filled form as images, processing the images to generate a text string for each line of each version; b. for each version, generating a matrix of rows of text strings; for each row for each version: c. generating a best match for the row of one version with one of the vectors within a predetermined number of rows of the vector for the row of another version; d. constructing a character histogram for characters in the text string in that row; e. constructing a word histogram for words in the text string in that row; the method further comprising: f. responsive to the generating the best match, identifying field labels for the filled forms from the character histograms and the word histograms; and g. generating a blank form with the identified field labels.
2. The computer-implemented method of claim 1, wherein a number of said versions in a. is two or three.
3. The computer-implemented method of claim 1, wherein the predetermined number of rows is three.
4. The computer-implemented method of claim 1, wherein the processing comprises identifying line images in each filled form, and generating a text string from each of the identified line images.
5. The computer-implemented method of claim 1, further comprising, responsive to generating the best match, aligning respective text lines corresponding to the vectors in the best match.
6. The computer-implemented method of claim 5, wherein the aligning comprises scaling text in the respective text lines so that the text in the respective text lines is the same size.
7. The computer-implemented method of claim 6, wherein the scaling comprises identifying characters in the respective text lines that are the same, and performing the scaling on the identified characters.
8. The computer-implemented method of claim 1, wherein generating the best match comprises: a. creating a vector for the text string in that row, using identified words in the text string, and assigning a number in the vector for each identified word in the text string; b. comparing a vector for the row of one version with vectors within a predetermined number of rows of the vector for the row of another version; c. responsive to the comparing, generating the best match.
9. The computer-implemented method of claim 8, further comprising, for each row in each version, identifying words appearing multiple times in each of the text strings, and, for each of the words appearing multiple times, assigning the same number to each position in the vector where the word appears.
10. The computer-implemented method of claim 1, wherein the x-axis of the word histogram represents the words in text lines and the y-axis of the word histogram represents the density of the underlying distribution of the characters in the word, where the range of the density is [0, 1], using the following equation:
11. A system comprising: a processor; and a non-transitory memory storing instructions which, when performed by the processor, perform a method comprising: a. responsive to receiving versions of a filled form as images, processing the images to generate a text string for each line of each version; b. for each version, generating a matrix of rows of text strings; for each row for each version: c. generating a best match for the row of one version with one of the vectors within a predetermined number of rows of the vector for the row of another version; d. constructing a character histogram for characters in the text string in that row; e. constructing a word histogram for words in the text string in that row; the method further comprising: f. responsive to the generating the best match, identifying field labels for the filled forms from the character histograms and the word histograms; and g. generating a blank form with the identified field labels.
12. The system of claim 11, wherein a number of said versions in a. is two or three.
13. The system of claim 11, wherein the predetermined number of rows is three.
14. The system of claim 11, wherein the processing comprises identifying line images in each filled form, and generating a text string from each of the identified line images.
15. The system of claim 11, wherein the method further comprises, responsive to generating the best match, aligning respective text lines corresponding to the vectors in the best match.
16. The system of claim 15, wherein the aligning comprises scaling text in the respective text lines so that the text in the respective text lines is the same size.
17. The system of claim 16, wherein the scaling comprises identifying characters in the respective text lines that are the same, and performing the scaling on the identified characters.
18. The system of claim 11, wherein generating the best match comprises: a. creating a vector for the text string in that row, using identified words in the text string, and assigning a number in the vector for each identified word in the text string; b. comparing a vector for the row of one version with vectors within a predetermined number of rows of the vector for the row of another version; c. responsive to the comparing, generating the best match.
19. The system of claim 11, wherein the method further comprises, for each row in each version, identifying words appearing multiple times in each of the text strings, and, for each of the words appearing multiple times, assigning the same number to each position in the vector where the word appears.
20. The system of claim 11, wherein the x-axis of the word histogram represents the words in text lines and the y-axis of the word histogram represents the density of the underlying distribution of the characters in the word, where the range of the density is [0, 1], using the following equation:
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The foregoing and other features and aspects of the present invention now will be described in detail with reference to the accompanying drawings, in which:
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
DETAILED DESCRIPTION
[0018]
[0019] In an embodiment, computing system 150 will process filled forms to identify field labels. Computing system 150 may include one or more processors, one or more storage devices, and one or more solid-state memory systems (which are different from the storage devices, and which may include both non-transitory and transitory memory).
[0020] As part of the discernment of field label and value location, computing system 150 may generate bounding boxes around text, using bounding box generation system 160. In an embodiment, computing system 150 may include a text scaling and alignment system 165 to compensate for images of different scale in different filled forms. In an embodiment, storage 175 may store the input images that computing system 150 processes.
[0021] Computing system 150 may be in a single location, with network 155 enabling communication among the various elements in computing system 150. Additionally or alternatively, one or more portions of computing system 150 may be remote from other portions, in which case network 155 may signify a cloud system for communication. In an embodiment, even where the various elements are co-located, network 155 may be a cloud-based system.
[0022] Additionally or alternatively, processing system 190, which may contain one or more of the processors, storage systems, and memory systems referenced above, may implement one or more algorithms to resolve locations for field labels.
[0023] In order to have line-level detection, the images of lines are first extracted using line segmentation algorithms. Ordinarily skilled artisans will be aware of available line segmentation algorithms, and so for the sake of compactness of description line segmentation algorithm details are omitted here.
[0024]
[0025] Field 225, 225 is the diagnosis field, filled in with a description of the diagnosis. Field 230, 230 is the procedure/treatment field, filled in with an identification of procedures and/or treatments to be followed. Field 235, 235 is a begin date field, filled in with a date on which the procedure and/or treatment is to begin. Field 240, 240 is an end date field, filled in with a date on which the procedure and/or treatment is to end. Similarly to fields 210, 210, in
[0026]
[0027] One aspect of filled forms such as
[0028]
[0029] In an embodiment, after the line images in
[0030] For example, after the line image 310 in
[0031] In view of the foregoing, in an embodiment it is possible to encode a set of words in a text string as follows. [0032] 1) The words in the text string may be numbered with successively increasing numbers from left to right, to form a numerical vector representation of the text string. For example, the text string from the line image 310 may be encoded as a vector S.sub.1={1, 2, 3, 4, 5, 6, 2, 7, 8, 9}. That is, the word DATE appears twice in the vector, and so is assigned the same number [0033] 2) A vector S.sub.2 may be encoded for the text string 320 in
[0034] In an embodiment, if there are words in vector S.sub.2 that are not in vector S.sub.1, those words may be encoded to follow the number of the last position in vector S.sub.1. In an embodiment, if the second text string is shorter than the first text string, padding zeros may be added to vector S.sub.2 to make vector S.sub.2 the same length as vector S.sub.1. Vector S.sub.2 may be encoded in this way so that a length of a longest increasing subsequence of vector S.sub.2 may be a measure of similarity between vector S.sub.1 and vector S.sub.2, where the longest increasing subsequence is a subsequence of a given sequence in which the subsequence's elements are in sorted order, lowest to highest, and in which the subsequence is as long as possible.
[0035] With the foregoing comparison of two vectors for the same line of text in two different filled forms, it is useful to consider the performance of this comparison for all lines of text in the two different filled forms. For purposes of this comparison, it is appropriate to select, by hand, two copies of the same form, filled out differently. In this way, the text in the two copies may be compared, and the field labels identified.
[0036] Looking back at
[0037] Looking now at the two different copies of the same form, after digitizing and word segmentation, there may be M text strings (corresponding to M line images) from the first image and N text strings (corresponding to N line images) from the second image. M and N may be different for a variety of reasons. For example, there may be additional text lines in one or the other of the images, as
where a.sub.i1 . . . a.sub.iK, a set of words for the first image, is a row in matrix A and b.sub.i1 . . . b.sub.iK, a set of words for the second image, is a row in matrix B, respectively. To determine correspondence between a row in matrix A and a row in matrix B, in an embodiment, the following sequence may be performed: [0038] 1) Take a row, a.sub.i1 . . . a.sub.iK, from matrix A; [0039] 2) Find any repeated words in a.sub.i1 . . . a.sub.iK and append a repeated word with _n at the end; [0040] 3) Encode a.sub.i1 . . . a.sub.iK as a vector as described above; [0041] 4) Take a set of rows, b.sub.j1 . . . b.sub.jK from matrix B, where, in an embodiment, j=i3 to i+3. Depending on the embodiment, a larger or smaller number of rows either before or after the row of focus may be selected. A reason for looking at at least three lines (that is, at least one line before and one line after the line of focus) is to account for possible situations in which the filled-in text associated with a particular field label may occupy different numbers of lines in the respective filled form, as in the examples discussed above with respect to
[0045] After the ith iteration of the just-described process, resulting in associating two text lines, l1 and l2, respectively, from the two images, the next step is to align l1 and l2 before going on to the next row. Alignment of l1 and l2 may be necessary because of considerations such as differing image scale or font size between the two text lines. To determine the correct alignment, in an embodiment each word in l1 and l2 may be segmented into characters or words. Because the text lines have been matched based on a vector match, it is likely that there will be characters or words in common between the two text lines, so that scaling can be carried out. If no common characters or words are found, the lack of common characters implies that there are no common keywords existing in these two lines. In this case, the process may go on to handle other pairs of text lines, as described previously.
[0046] In an embodiment, after a certain number of failures of the process in terms of ability to align text lines, it may be presumed that alignment has failed. Such failure may occur because of image quality of one or both of the filled forms being compared, or because of artifacts on one or both of the filled forms, or for other technical reasons which ordinarily skilled artisans will appreciate. In that event, another filled form may be substituted for one of the just-compared filled forms, or another pair of filled forms may be selected, and the process repeated.
[0047] If there are common characters or words between the two text lines l1 and l2, it will be possible to identify a group of characters or words that are common to both l1 and l2, as seeds for calculating the scale S.sub.x and S.sub.y for the two text lines. For example, looking toward the bottom of
[0048] Using the above calculated scales S.sub.x and S.sub.y, all of the bounding boxes of words in l1 can be made to the same scale as in l2. Then, each word in l2 can be aligned to the corresponding word in l1 based on a position of its bounding box as
[0049] Still further, a word histogram can be constructed on top of the character histogram, where the x-axis represents the words in text lines and the y-axis represents the density of the underlying distribution of the characters in the word, where the range of the density is [0, 1], using the following equation:
where is the density of the underlying distribution of the characters in a word; m is a number of text lines; n is a number of characters in the word; and f.sub.i is the number count for the ith character in the word. The character histogram and the word histogram form a histogram hierarchy which can be used to identify field labels given several form images. A group of adjacent words with higher density values are determined to be the field labels to be extracted. With this extraction, a blank form such as the one shown in
[0050] The foregoing description has been based on the assumption that the two filled forms that are received at the beginning of the process can be matched, aligned, and scaled, and character and word histograms produced to determine field labels from just those two forms. Sometimes, an additional filled form will need to be added to the comparison. In that event, looking at three images with constructed word matrices A, B and C being built, and identified row correspondence between matrices A and B and between matrices A and C, the following process may be provided. [0051] 1. Take a row of words from matrix A and from the corresponding row in matrices B and C; [0052] 2. Align the row of words from matrices B and C to the row of words from matrix A; [0053] 3. Build histograms by using the three aligned rows; [0054] 4. Localize keywords based on hierarchy histogram for this row; [0055] 5. Repeat 1 to 4 until all rows are processed.
[0056]
[0057] At 411, for a first matrix, within a text string, words appearing multiple times are identified. At 413, a vector is created for each text string, using the identified words. At 415, where the same words are identified in the text string, the same number at that position in the vector is assigned.
[0058] At 417, a determination is made to see whether the process is far enough along to be working on a line image that is far enough down the matrix to be able to look a predetermined number of rows (x, an integer) before and after the row being worked on. If not (no at 417), then at 419, the vector for the mth row is used, and flow proceeds to 429, which will be described. If the process is far enough along (yes at 417), then at 421, for another matrix, a number of text strings from mx to m+x is selected, and at 423, a vector is created for each of those text strings, using identified words in each text string. As before, the same number is assigned to vector positions in each text string that contain the same word for that position.
[0059] At 427, then, a best vector match between row m of the first matrix and rows mx to m+x of another matrix may be found. (Similarly to the description for matrices A, B, and C, 427 may be a comparison between two matrices, or among three, or in any event, among a very small number compared to a number required to obtain a training set to train an AI-based system.) At 429, text lines are aligned corresponding to the matched vectors.
[0060] Having matched the vectors, at 433 a determination is made as to whether scaling is possible to give the text lines the same scale. If not (no at 433), then at 435 a check is made to see whether all of the lines of the first matrix have been processed. If not (no at 435), then at 437, m is incremented, and flow returns to 411. If all of the lines of the first matrix have been processed (yes at 435), then at 439, a further version or versions of the same filled form are received, and flow returns to 403.
[0061] If scaling is possible (yes at 433), then at 441, seeds are identified from among characters and/or words for each text line, and a scale is calculated. At 443, the calculated scales are used to give the text lines under comparison the same scale. Here, again, the number of text lines under comparison may be two or three, depending on the number of matrices being compared in the embodiment, or possibly a few more, but in any event, a much smaller number compared to a number required to obtain a training set to train an AI-based system.
[0062] At 445, after scaling has been accomplished, a character histogram may be constructed for characters in the text lines. Then, at 447, a word histogram may be constructed for words in the text lines. After this, at 449, field labels may be identified from the histograms, with the higher numbers as represented along the y axis of the histograms signifying characters in words in the field labels (445), and words in the field labels (447).
[0063] At 451, if all field labels for the form have been identified, (yes at 451), then at 453 a blank form with the field labels may be generated. If not all of the field labels have been generated (no at 451), then at 455 a check is made to see whether all of the lines of the first matrix have been processed. If not (no at 455), then at 437, m is incremented, and flow returns to 411. If all of the lines of the first matrix have been processed (yes at 455), then at 439, a further version or versions of the same filled form are received, and flow returns to 403.
[0064] While aspects of the present invention have been described in detail with reference to various drawings, ordinarily skilled artisans will appreciate that there may be numerous variations within the scope and spirit of the invention. Accordingly, the invention is limited only by the following claims.