Path score calculating method for intelligent character recognition
09977976 ยท 2018-05-22
Assignee
Inventors
Cpc classification
G06V30/18076
PHYSICS
International classification
Abstract
Disclosed herein is a method that improves the performance of handwriting recognition by calculating path scores so as to identify the path with the highest score as the basis for interpreting handwritten characters. Specifically, the method comprises the following steps: detecting connected regions in an input image comprising handwritten characters; determining a plurality of segmentation positions of the input image; obtaining a plurality of recognition results for each segment of each path in the input image, wherein each recognition result represents a character candidate for the segment and each path comprises one or more segments; obtaining a plurality of scores corresponding to the recognition results; calculating scores for each path in the input image based on segment lengths and the scores corresponding to the recognition results; and using the path with the highest score to interpret the handwritten characters in the input image.
Claims
1. A method for handwriting recognition, comprising: detecting connected regions in an input image comprising handwritten characters; determining a plurality of segmentation positions of the input image; obtaining multiple alternative paths of the input image, each path containing one or more segments, each path covering all of the connected regions of the input image, the multiple alternative paths being different from each other; for each path: obtaining a plurality of recognition results for the segments of the path, wherein each recognition result represents a character candidate for a corresponding segment; obtaining a plurality of scores corresponding to the recognition results of the segments; and calculating a path score for the path based on segment lengths and the scores corresponding to the recognition results of the segments using the following equation: S.sub.path=a*(S.sub.11*Len.sub.1+S.sub.12*Len.sub.2+ . . . +S.sub.1m*Len.sub.m)+b*min(S.sub.11, S.sub.12, . . . S.sub.1m), wherein Len.sub.1, Len.sub.2 . . . Len.sub.m represent respective segment lengths for Seg.sub.1, Seg.sub.2, . . . , Seg.sub.m of the path, (S.sub.11, S.sub.12, . . . , S.sub.1m) represent scores corresponding to recognition results (R.sub.11, R.sub.12, . . . R.sub.1m) of a decoding path, min means the minimization function, and a and b are pre-defined constants; and identifying the path with the highest path score and using that path to interpret the handwritten characters in the input image.
2. The method of claim 1, wherein the recognition results (R.sub.11, R.sub.12, . . . R.sub.1m) of the decoding path represent top character candidates for Seg.sub.1, Seg.sub.2 . . . Seg.sub.m of the path.
3. The method of claim 1, further comprising: estimating an upper baseline and a lower baseline for the input image based on the connected regions; and estimating a character height based on the upper and lower baselines.
4. The method of claim 3, wherein the upper and lower baselines are estimated by: detecting top and bottom positions of each connected region of the input image; clustering the top positions into a first higher center and a first lower center, wherein the first lower center is selected as the upper baseline; and clustering the bottom positions into a second higher center and a second lower center, wherein the second higher center is selected as the lower baseline.
5. The method of claim 1, wherein the segmentation positions are determined by: obtaining a top contour and a bottom contour of the input image; selecting a plurality of extremum points in the top and bottom contours as potential segmentation positions; and for each of the plurality of potential segmentation positions, drawing a vertical line at the potential segmentation position, determining whether the vertical line crosses a foreground of the input image more than two times and if so, deleting the potential segmentation position, and determining whether the vertical line crosses a circle in the foreground and if so, deleting the potential segmentation position.
6. The method of claim 5, wherein the foreground of the input image comprises a connected region of black pixels.
7. The method of claim 5, further comprising: for each of the plurality of potential segmentation positions, if the vertical line does not cross the foreground of the input image more than two times and does not crosses a circle in the foreground, keeping the potential segmentation position as a segmentation position.
8. The method of claim 1, wherein the segmentation positions define one or more segments in the input image.
9. The method of claim 1, wherein each segment comprises one or more non-overlapping black pixels in the input image.
10. The method of claim 1, wherein the recognition results and corresponding scores are obtained by a character classifier.
11. A computer program product comprising a computer usable non-transitory medium having a computer readable program code embedded therein for controlling a data processing apparatus, the computer readable program code configured to cause the data processing apparatus to execute a process for handwriting recognition, the process comprising: detecting connected regions in an input image comprising handwritten characters; determining a plurality of segmentation positions of the input image; obtaining multiple alternative paths of the input image, each path containing one or more segments, each path covering all of the connected regions of the input image, the multiple alternative paths being different from each other; for each path: obtaining a plurality of recognition results for the segments of the path, wherein each recognition result represents a character candidate for a corresponding segment; obtaining a plurality of scores corresponding to the recognition results of the segments; and calculating a path score for the path based on segment lengths and the scores corresponding to the recognition results of the segments using the following equation: S.sub.path=a*(S.sub.11*Len.sub.1+S.sub.12*Len.sub.2+ . . . +S.sub.1m*Len.sub.m)+b*min(S.sub.11, S.sub.12, . . . S.sub.1m), wherein Len.sub.1, Len.sub.2 . . . Len.sub.m represent respective segment lengths for Seg.sub.1, Seg.sub.2, . . . , Seg.sub.m of the path, (S.sub.11, S.sub.12, . . . , S.sub.1m) represent scores corresponding to recognition results (R.sub.11, R.sub.12 . . . R.sub.1m) of a decoding path, min means the minimization function, and a and b are pre-defined constants; and identifying the path with the highest path score and using that path to interpret the handwritten characters in the input image.
12. The computer program product of claim 11, wherein the recognition results (R.sub.11, R.sub.12, . . . R.sub.1m) of the decoding path represent top character candidates for Seg.sub.1, Seg.sub.2, . . . Seg.sub.m of the path.
13. The computer program product of claim 11, wherein the segmentation positions define one or more segments in the input image.
14. The computer program product of claim 11, wherein each segment comprises one or more non-overlapping black pixels in the input image.
15. The computer program product of claim 11, wherein the recognition results and corresponding scores are obtained by a character classifier.
16. A computer program product comprising a computer usable non-transitory medium having a computer readable program code embedded therein for controlling a data processing apparatus, the computer readable program code configured to cause the data processing apparatus to execute a process for handwriting recognition, the process comprising: detecting connected regions in an input image comprising handwritten characters; estimating an upper baseline and a lower baseline for the input image based on the connected regions, wherein the upper and lower baselines are estimated by: detecting top and bottom positions of each connected region of the input image; clustering the top positions into a first higher center and a first lower center, wherein the first lower center is selected as the upper baseline; and clustering the bottom positions into a second higher center and a second lower center, wherein the second higher center is selected as the lower baseline; estimating a character height based on the upper and lower baselines; determining a plurality of segmentation positions of the input image; obtaining multiple alternative paths of the input image, each path containing one or more segments, each path covering all of the connected regions of the input image, the multiple alternative paths being different from each other; for each path: obtaining a plurality of recognition results for the segments of the path, wherein each recognition result represents a character candidate for a corresponding segment; obtaining a plurality of scores corresponding to the recognition results of the segments; and calculating a path score for the path based on segment lengths and the scores corresponding to the recognition results of the segments; and identifying the path with the highest path score and using that path to interpret the handwritten characters in the input image.
17. A computer program product comprising a computer usable non-transitory medium having a computer readable program code embedded therein for controlling a data processing apparatus, the computer readable program code configured to cause the data processing apparatus to execute a process for handwriting recognition, the process comprising: detecting connected regions in an input image comprising handwritten characters; determining a plurality of segmentation positions of the input image, wherein the segmentation positions are determined by: obtaining a top contour and a bottom contour of the input image; selecting a plurality of extremum points in the top and bottom contours as potential segmentation positions; and for each of the plurality of potential segmentation positions, drawing a vertical line at the potential segmentation position, determining whether the vertical line crosses a foreground of the input image more than two times and if so, deleting the potential segmentation position, and determining whether the vertical line crosses a circle in the foreground and if so, deleting the potential segmentation position; obtaining multiple alternative paths of the input image, each path containing one or more segments, each path covering all of the connected regions of the input image, the multiple alternative paths being different from each other; for each path: obtaining a plurality of recognition results for the segments of the path, wherein each recognition result represents a character candidate for a corresponding segment; obtaining a plurality of scores corresponding to the recognition results of the segments; and calculating a path score for the path based on segment lengths and the scores corresponding to the recognition results of the segments; and identifying the path with the highest path score and using that path to interpret the handwritten characters in the input image.
18. The computer program product of claim 17, wherein the foreground of the input image comprises a connected region of black pixels.
19. The computer program product of claim 17, wherein the process further comprises: for each of the plurality of potential segmentation positions, if the vertical line does not cross the foreground of the input image more than two times and does not crosses a circle in the foreground, keeping the potential segmentation position as a segmentation position.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
(11) Embodiments of the present invention are directed to a method that improves the performance of handwriting recognition by calculating path scores so as to identify the path with the highest score as the basis for interpreting handwritten characters. Specifically, the method comprises the following steps: detecting connected regions in an input image comprising handwritten characters; determining a plurality of segmentation positions of the input image; obtaining a plurality of recognition results for each segment of each path in the input image, wherein each recognition result represents a character candidate for the segment and each path comprises one or more segments; obtaining a plurality of scores corresponding to the recognition results; calculating scores for each path in the input image based on segment lengths and the scores corresponding to the recognition results; and using the path with the highest score to interpret the handwritten characters in the input image.
(12) One embodiment of the present invention performs a baseline estimation process by detecting top and bottom positions of each connected region of an input image, clustering the top positions into a first higher center and a first lower center, wherein the first lower center is selected as the upper baseline, and clustering the bottom positions into a second higher center and a second lower center, wherein the second higher center is selected as the lower baseline.
(13) Another embodiment of the present invention determines segmentation positions in an input image through the following process: obtaining a top contour and a bottom contour of the input image; selecting a plurality of extremum points in the top and bottom contours as potential segmentation positions; for each of the plurality of potential segmentation positions, drawing a vertical line at the potential segmentation position, determining whether the vertical line crosses a foreground of the input image more than two times and if so, deleting the potential segmentation position, and determining whether the vertical line crosses a circle in the foreground and if so, deleting the potential segmentation position.
(14) Turning to
(15) Usually the memory 102 stores computer-executable instructions or software programs accessible to the CPU 101, which is configured to execute these software programs as needed in operation. Preferably, such software programs are designed to run on Windows OS, Macintosh OS, or Unix X Windows or other popular computer operating systems implementing a GUI (graphic user interface), such as a touchscreen and/or a mouse and a keyboard, coupled with a display monitor. In one embodiment, such software in the memory 102 includes a recognizing program 108, which, when executed by the CPU 101, enables the computer 10 to recognize human handwritten characters. As will be described in detail below, the recognizing program 108 enables the computer 10 to recognize human handwritings by obtaining an image of handwritten characters (e.g., a scanned image), detecting connected regions and segmentation positions in the image, calculating a path score for each path in the image based on the score and length of each segment included in the path, and using the path with the highest path score as the basis to find the most plausible words for the handwritten characters.
(16) In addition to the recognizing program 108, the CPU 101 is also configured to execute other types of software (e.g., administrative software), applications (e.g., network communication application), operating systems, etc.
(17) In
(18) As shown in
(19)
(20) TABLE-US-00001 Connected Region el oq u en t Top Position 1 2 3 4 5 Value 1.0 2.0 2.1 2.2 1.02
At step S203, the top positions are clustered into two groups or centers based on their position values. In one embodiment, the clustering is done by applying a K-means function to the position values. For example, in the case of the eloquent image as illustrated above, the two centers include a higher center (1.0, 1.02) and a lower center (2.0, 2.1, 2.2). The lower center is determined to be the upper baseline at step S204. As a result, as shown in
(21) Back to
(22) TABLE-US-00002 ConnectedRegion el oq u en t Bottom Position 1 2 3 4 5 Value 2.0 1.0 2.1 2.2 1.02
At step S303, the bottom positions are clustered into two groups or centers based on their position values. In one embodiment, the clustering is done by applying a K-means function to the position values. For example, in the case of the eloquent image as illustrated above, the two centers include a lower center (1.0, 1.02) and a higher center (2.0, 2.1, 2.2). The higher center is deemed as the bottom baseline at step S304. As a result, as shown in
(23) Back to
(24) Next, at step S105, the algorithm includes a step of determining the segmentation positions in the input image. This segmentation position determination step is further illustrated in the sub-routine in
(25) As shown in
(26) At step S404, one of the positions in Set A is selected, for example, P1 in
(27) Continuing to step S408, the sub-routine determines whether there is still any unprocessed position in Set A of the potential segmentation positions. If all positions are processed and there is no unprocessed position left, the sub-routine for determining the segmentation positions ends, otherwise it continues to step S409, where one more unprocessed position from Set A is selected to repeat the determination process comprising the steps S405 to S409.
(28) As another example, if the position P5 in
(29) It should be understood that the positions and lines described above are for illustration only and may not be so limited as shown in
(30) The segmentation position determination sub-routine in
(31) As used herein, the term path consists of a series of segments covering every pixel of an input image, where each segment comprises a number of different and non-overlapping pixels in the input image. In other words, a path may be formed from various combinations of different segments in the input image. Again, take the input image of eloquent in
(32) Specifically, as shown in
(33) TABLE-US-00003 Rec. Result R.sub.11 R.sub.12 R.sub.13 R.sub.14 R.sub.15 . . . Character e c l o i . . .
Further, for each recognition result, a corresponding score is calculated as follows:
(34) TABLE-US-00004 Score S.sub.11 S.sub.12 S.sub.13 S.sub.15 S.sub.15 . . . Value 9 7 5 3 2 . . .
As seen above, based on the scores (in a descending order), the top candidate for Segment 1 of Path 1 is the character e.
(35) In operation, the recognition list and corresponding scores are generated by the character classifier, which may be embodied as a software module, according to the following process: when an image (denoted as I) is received, it is compared with each element in a set of character candidates (denoted as C containing N elements, N representing the number of classes or categories). For each comparison, a score is given to indicate the similarity between the input image and the compared element. As a result, there will be N pairs of score and element combinations (e.g., S.sub.1-E.sub.1, S.sub.2-E.sub.2, . . . S.sub.N-E.sub.N). By sorting the scores (e.g., S.sub.1, S.sub.2, . . . S.sub.N), an output comprising a candidate list and a score list can be generated. To give a more specific example, each element (denoted as a class) in C may comprise a template, which is essentially a k-dimensional vector. In one embodiment, such a template can be obtained as follows: certain training data (i.e., input image data) are classified, for example, labeled as Class A, from which a plurality of labeled training samples can be obtained, with each sample being transformed into a k-dimensional vector; thereafter, the mean or average of these k-dimensional vectors will be considered as one template for Class A. When an image is received, it is converted into another k-dimensional vector via a feature extraction process. Then, the distances are measured between the input vector and N templates using Euclidean metric, for example. The measured distances (denoted as d.sub.1, d.sub.2, . . . d.sub.N) can be converted into scores S.sub.1, S.sub.2, . . . S.sub.N through the following equation: S.sub.i=1/(1+exp(d.sub.i)). If the score list (S) is sorted in a descending order, the corresponding candidate or recognition list will also be presented in a similar descending order. It should be noted that the distances can be measured using different metrics, such as Mahalanobisdistance, HMM or softmax regression, etc.
(36) Back to
S.sub.path=a*(S.sub.11*Len.sub.1+S.sub.12*Len.sub.2+ . . . +S.sub.1m*Len.sub.m)+b*min(S.sub.11,S.sub.12, . . . S.sub.1m)
where Len.sub.1, Len.sub.2 . . . Len.sub.m represent the segment lengths for Seg.sub.1, Seg.sub.2, . . . Seg.sub.m, respectively, (S.sub.11, S.sub.12, . . . S.sub.1m) are corresponding scores for (R.sub.11, R.sub.12, . . . R.sub.1m), min means the minimization function, while a and b are pre-defined constants.
(37) After the path score is calculated for the selected path using the above equation, at step S506, the score-calculating process further determines whether there is any unprocessed path in the image, and if so, the unprocessed path is selected at step 507 for score calculation by repeating the steps S504 to S506. Otherwise the process ends when there is no more unprocessed path left in the image.
(38) Based on the calculated scores for each decoding path, the algorithm identifies a path with the highest score, which path will be used for interpreting the handwritten characters in the input image. For example, as shown in
(39) It will be apparent to those skilled in the art that various modification and variations can be made in the above-described method and system of the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention cover modifications and variations that come within the scope of the appended claims and their equivalents.