TEXT IMAGE CORRECTION METHOD AND APPARATUS
20240161523 ยท 2024-05-16
Assignee
Inventors
Cpc classification
International classification
Abstract
A text image correction method and a corresponding text image correction apparatus. Frequency information of a row-direction cumulative curve used by the method is sensitive to an error between a compensation angle for a tilt angle and a real tilt angle, and the method thus has good robustness. The method can accurately estimate the compensation angle for a tilt angle and correct a tilted text image. The method and apparatus can be applied to scenarios such as image pre-processing, automatic compensation for angles of scanned text images, automatic compensation for tilt angles of mobile phone photos.
Claims
1. A text image correction method, comprising the following steps: preprocessing to-be-corrected images into binary images; sequentially rotating the binary images in the same direction with a predetermined step size, recording a cumulative rotation angle upon each rotation and calculating a row cumulant of a current binary image until the current binary image is rotated to a threshold angle; extracting a frequency satisfying preset conditions for the row cumulant of each binary image; and correcting the to-be-corrected images by using the cumulative rotation angle corresponding to the maximum frequency among the frequencies of the binary images satisfying the preset conditions as a compensation angle.
2. The text image correction method according to claim 1, wherein the preprocessing to-be-corrected images into binary images further comprises: converting the to-be-corrected images into grayscale images; and converting the grayscale images into the binary images based on a maximum between-class variance method.
3. The text image correction method according to claim 1, wherein the calculating a row cumulant of the current binary image further comprises: calculating a row cumulant of each row for the current binary image; and sequentially constructing the row cumulant of each row into a column vector S.sup.t as the row cumulant of the current binary image, a calculation formula for the row cumulant of each row being:
s.sub.i.sup.t=?.sub.jd.sub.(i,j).sup.t wherein s.sub.i.sup.t is a row cumulant of an i.sup.th row of a current binary image D.sup.t, and d.sub.(i,j).sup.t is an element in the i.sup.th row and j.sup.th column of the current binary image D.sup.t.
4. The text image correction method according to claim 1, wherein the extracting a frequency satisfying preset conditions for the row cumulant of each binary image further comprises: performing sliding window smoothing filtering on the row cumulant of the current binary image to obtain a corresponding filtering sequence Q.sup.t; performing mean subtraction processing on t e filtering sequence Q.sup.t to obtain a corresponding mean-subtracted sequence H.sup.t; performing spectral analysis on the mean-subtracted sequence H.sup.t to obtain a discrete sequence P.sup.t; and extracting a frequency f.sup.t satisfying the preset conditions from the discrete sequence P.sup.t.
5. The text image correction method according to claim 4, wherein a calculation formula for the filtering sequence Q.sup.t is:
6. The text image correction method according to claim 5, wherein a calculation formula for the mean-subtracted sequence H.sup.t is:
h.sub.i.sup.t=q.sub.i.sup.t?M(Q.sup.t) wherein h.sub.i.sup.t is an i.sup.th element of the mean-subtracted sequence H.sup.t, and M(*) represents calculation of a mean of an input sequence.
7. The text image correction method according to claim 6, wherein a calculation formula for the discrete sequence P.sup.t is:
8. The text image correction method according to claim 7, wherein a calculation formula for the frequency ?.sup.t satisfying the preset conditions is:
9. The text image correction method according to claim 1 wherein a calculation formula for sequentially rotating the binary images in the same direction is: *
represents rounding a result
10. A text image correction apparatus, comprising: a processor and a memory, wherein the processor reads a computer program in the memory to perform the following operations: preprocessing to-be-corrected images into binary images; sequentially rotating the binary images in the same direction with a predetermined step size, recording a cumulative rotation angle upon each rotation, and calculating a row cumulant of a current binary image until the current binary image is rotated to a threshold angle; extracting a frequency satisfying preset conditions for the row cumulant of each binary image; and correcting the to-be-corrected images by using the cumulative rotation angle corresponding to the maximum frequency among the frequencies of the binary images satisfying the preset conditions as a compensation angle.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
DETAILED DESCRIPTION
[0055] The Summary section of the present disclosure will be described in detail with reference to the accompanying drawings and specific embodiments.
[0056] As shown in
[0057] 101: Preprocess to-be-corrected images into binary images. Specifically:
[0058] 1011: Convert the to-be-corrected images into grayscale images.
[0059] As shown in
[0060] In general, the to-be-corrected image D? is colored, which is represented by data of three color channels of r/g/b having dimensions of m?n.
[0061] The to-be-corrected image D? is converted into a grayscale image D.sup.g according to the following formula:
D.sup.g=0.2989*R+0.5870*G+0.1140*B(1)
[0062] In Formula (1), R/G/B represent red, green, and blue components in the original color image D?, respectively, and relationships between values of elements of a matrix and the original color image are as follows:
[0063] In Formula (2), r.sub.(i,j), g.sub.(i,j), and b.sub.(i,j) represent elements of an i.sup.th row and a j.sup.th column in the R/G/B matrix, respectively.
[0064] 1012: Convert the grayscale images into the binary images based on a maximum between-class variance method.
[0065] After obtaining the grayscale image D.sup.g, in order to eliminate the influence of background light of a photographing environment on the determination of a text area, a suitable threshold T is further calculated using a maximum between-class variance method, and then converted into a binary image D.sup.b according to the following formula expressed:
[0066] In Formula (3), d.sub.(i,j).sup.g and d.sub.(i,j).sup.b represent elements of an i.sup.th and a j.sup.th column of D.sup.g and D.sup.b, respectively.
[0067] The maximum between class variance method, also referred to as an Ostu method, is a classical and commonly used threshold selection algorithm. The method realizes automatic selection of a global threshold T by the statistics of histogram characteristics of the whole image, and includes the following algorithm steps:
[0068] Step 1: Calculate a histogram of an image: statistically obtaining the number of pixels falling on each bin among 256 bins (0-255) of all pixel points of the image.
[0069] Step 2: Normalize the histogram: dividing the number of pixels in each bin by the total number of pixels.
[0070] Step 3: Start iteration from 0, where i represents a threshold of classification, namely a gray level.
[0071] Step 4: Statistically obtain a ratio w0 of pixels (pixels having pixel values within this range are referred to as foreground pixels) with gray levels of 0-i to the whole image through the normalized histogram and statistically obtain an average grayscale u0 of the foreground pixels, statistically obtain a ratio w1 of pixels (pixels having pixel values within this range are referred to as background pixels) with gray levels of i-255 to the whole image and statistically obtain an average grayscale u1 of the background pixels,
[0072] Step 5: Calculate a variance of the foreground pixels and the background pixels g=w0*w1*(u0?u1)(u0?u1).
[0073] Step 6: Turn i++ to 4, and end iteration when i is 256.
[0074] Step 7: Take an i value corresponding, to maximum g as a global threshold of the image.
[0075] 102: Sequentially rotate the binary images in the same direction with a predetermined step size, record a cumulative rotation angle upon each rotation, and calculate a row cumulant of a current binary image until the current binary image is rotated to a threshold angle.
[0076] In order to extract distribution information of image text and blank space under different rotation angles, the binary image D.sup.b is required to be ergodically rotated by a certain angle, the dimension of two-dimensional information is reduced to one-dimensional space by calculating a row cumulant, and then a plurality of strong frequency components are extracted by performing spectral analysis on the row cumulant and stored for subsequent analysis.
[0077] As shown in
[0078] In one embodiment of the present disclosure, the same direction is clockwise or counterclockwise. Since the to-be-corrected image D? is skewed clockwise or counterclockwise, there will always be a unique angle corresponding to the horizontal text in the image in the process of sequentially rotating in a certain direction by 180 degrees.
[0079] In one embodiment of the present disclosure, the step size ?.sub.? determines the accuracy of skew angle estimation, determines the size of system errors, and also affects the calculation complexity and delay of the whole algorithm. Assuming that the cumulative rotation angle of the binary image D.sup.b is ?.sub.t (increased from 0 degrees to ?.sub.t by 0.5 degrees each time), the current rotated image is D.sup.t, and the rotation process is:
[0080] In Formula (4), d.sub.(x,y).sup.t is an element of a rotated binary image in an x.sup.th row and a y.sup.th column, ?.sub.t is a rotation angle of the rotated binary image, d.sub.(i,j).sup.b is an element of an unrotated binary image in an i.sup.th row and j.sup.th column, and *
represents rounding a result.
[0081] After the rotated binary image is obtained, a row cumulant of the image may be calculated as S.sup.t.
[0082] 1021: Calculate a row cumulant of each row for the current binary image.
[0083] A calculation formula for the row cumulant of each row is:
s.sub.i.sup.t=?.sub.jd.sub.(i,j).sup.t(5)
[0084] In Formula (5), s.sub.i.sup.t is a row cumulant of an i.sup.th row of a current binary image D.sup.t, and d.sub.i,j).sup.t is an element in the i.sup.th row and j.sup.th column of the current binary image D.sup.t.
[0085] 1022: Sequentially construct the row cumulant of each row into a column vector S.sup.t as the row cumulant of the current binary image.
[0086] It is assumed that in one embodiment of the present disclosure, the to-be-corrected image D? is skewed clockwise by 8.5 degrees. When the corresponding binary image D.sup.b is rotated counterclockwise by 0.5 degrees, that is, ?.sub.t=0.5 degrees, a variation curve of a row cumulative variable of the binary image D.sup.b is shown in
[0087] 103: Extract a frequency satisfying preset conditions for the row cumulant of each binary image. Specifically:
[0088] 1031: Perform sliding, window smoothing filtering on the row cumulant of the current binary image to obtain a corresponding tittering sequence Q.sup.t.
[0089] As shown in
[0090] In Formula (6), Q.sup.t is an i.sup.th element of q.sub.i.sup.t, L is a width of a sliding window, and s.sub.i+j.sup.t is a row cumulant of an (i+j).sup.th row in the current binary image D.sup.t.
[0091] In one embodiment of the present disclosure, when the sliding window exceeds an actual length of S.sup.t, an invalid element in the window defaults to 0.
[0092] 1032: Perform mean subtraction processing on the filtering sequence Q.sup.t to obtain a corresponding mean-subtracted sequence H.sup.t.
[0093] In order to prevent the influence of a direct current signal on the subsequent frequency analysis, mean subtraction processing is performed on the filtering sequence Q.sup.t. A calculation formula for the mean-subtracted sequence H.sup.t is:
h.sub.i.sup.t=q.sub.i.sup.t?M(Q.sup.t)(7)
[0094] In Formula (7), h.sub.i.sup.t is an i.sup.th element of the mean-subtracted sequence H.sup.t, and M(*) represents calculation of a mean of an input sequence.
[0095] 1033: Perform spectral analysis on the mean-subtracted sequence H.sup.t to obtain a discrete sequence P.sup.t.
[0096] DFT spectral analysis is performed on the mean-subtracted sequence H.sup.t. A calculation formula for the discrete sequence P.sup.t is:
[0097] In Formula (8), h.sub.j.sup.t is the i.sup.th element of the mean-subtracted sequence H.sup.t, p.sub.k.sup.t is a k.sup.th element of the discrete sequence P.sup.t, and N is a minimum power of 2 greater than a length of the sequence H.sup.t.
[0098] In one embodiment of the present disclosure, the discrete sequence P.sup.t describes the magnitude of contribution of different frequency components to the mean-subtracted sequence H.sup.t, corresponding to the magnitude of the probability of regular distribution between text and blank space in the current binary image D.sup.t. Then, by frequency analysis on S.sup.t, variation law information of text and blank space in the binary image D.sup.t in the row direction with the variation of the cumulative rotation angle may be obtained, as shown in
[0099] 1034: Extract a frequency f.sup.t satisfying the preset conditions from the discrete sequence P.sup.t.
[0100] The main frequency components in P.sup.t are extracted. As shown in Formula (9), k satisfying the conditions is found and then converted into the corresponding frequency, f.sup.t satisfying the conditions is found to form an output vector F.sup.t.
[0101] A calculation formula for the frequency f.sup.t satisfying the preset conditions is:
[0102] In Formula (9), Fs is a sampling rate of the row cumulant of the current binary image, and ? is a configurable system parameter. In one embodiment of the present disclosure, the sampling rate Fs is an algorithm parameter with a value of 50 Hz, ? is mainly used for determining a standard for the magnitude of frequency contribution, and has a value of 15, If ? is too small, the frequency components of misjudgment will be increased. On the contrary, if it is too large, information frequency components may be missed. The best effect should be that one to two frequency components with obvious greater contribution may be selected in each rotation.
[0103] After extracting the frequency of the current binary image D.sup.t satisfying the preset conditions, the frequency satisfying the preset conditions is associated with the cumulative rotation angle ?.sub.t recorded correspondingly. Then, ?.sub.t is updated according to the step size ?.sub.?:
?.sub.t:=?.sub.t+?.sub.t(10)
[0104] The extraction and calculation process corresponding to the next binary image D.sup.t is entered, and ends when ?.sub.t=180 degrees.
[0105] 104: Correct the to-be-corrected images by using the cumulative rotation angle corresponding to the maximum frequency among the frequencies of the binary images satisfying the preset conditions as a compensation angle.
[0106] Ideally, if the text in the binary image is not skewed, the corresponding row cumulative curve of the image should oscillate according to the law of
[0107] A frequency component sequence ? F. is formed by splicing the output vector F.sup.t of each binary image end to end. The highest frequency is searched from each element of ? F. Then, the cumulative rotation angle corresponding to the highest frequency is obtained from the recorded cumulative rotation angles as a compensation angle:
[0108] After obtaining the compensation angle {tilde over (?)}, the binary image D.sup.b before rotation is rotated by {tilde over (?)} to output a corrected binary image {tilde over (D)}.sup.b:
[0109] In Formula (12), d.sub.(i,j).sup.b and {tilde over (d)}.sub.(x,y).sup.b represent elements of an i.sup.th row and a j.sup.th column of an image matrix D.sup.b respectively, and {tilde over (D)}.sup.b represents an element of an x.sup.th row and a y.sup.th column.
[0110] In one embodiment of the present disclosure, the to-be-corrected image D? is skewed clockwise by 8.5 degrees, and the to-be-corrected image D? may also be rotated counterclockwise by 8.5 degrees after obtaining the compensation angle {tilde over (?)} of 8.5 degrees.
[0111] The above technical solution is described in detail in combination with application examples as follows:
[0112] A to-be-corrected image D? is obtained and converted into a binary image D.sup.b. The binary image D.sup.b is skewed clockwise by 8.5 degrees. The binary image D.sup.b is rotated counterclockwise by 180 degrees with a step size of 0.5 degrees:
[0113] When the binary image D.sup.b is rotated counterclockwise by 0.5 degrees, an example diagram of the current binary image D.sup.t is shown in
[0114] When the binary image D.sup.b is continuously rotated counterclockwise to 8.5 degrees, an example diagram of the current binary image D.sup.t is shown in
[0115] When the binary image D.sup.b is continuously rotated counterclockwise to 69.5 degrees, an example diagram of the current binary image D.sup.t is shown in
[0116] As shown in
[0117] It can be seen that the corresponding highest frequency is the highest among all the frequencies generated in the rotation process when the text is horizontal in the binary image D.sup.t, thus verifying the rationality of the text image correction method.
[0118] In order to realize the text image correction method provided by the present disclosure, the present disclosure also provides a text image correction apparatus: As shown in
[0119] In the text image correction apparatus, the processor 82 reads a computer program in the memory 81 to perform the following operations:
[0120] preprocessing to-lie-corrected images into binary images;
[0121] sequentially rotating the binary images in the same direction with a predetermined step size, recording a cumulative rotation angle upon each rotation, and calculating a row cumulant of a current binary image until the current binary image is rotated to a threshold angle;
[0122] extracting a frequency satisfying preset conditions for the row cumulant of each binary image; and
[0123] correcting the to-be-corrected images by using the cumulative rotation angle corresponding to the maximum frequency among the frequencies of the binary images satisfying the preset conditions as a compensation angle.
[0124] The text image correction method and the text image correction apparatus provided by the present disclosure may estimate the compensation angle of the skew angle more accurately, and may correct skewed text and pictures. Frequency information of a row cumulative curve used is sensitive to an error between a compensation angle of a skew angle and a true skew angle, and therefore, the robustness is excellent. The text image correction method and the text image correction apparatus may be applied to image preprocessing, automatic angle compensation for scanned text and images, automatic compensation for phone camera skew angles, and other scenarios.
[0125] The text image correction method and apparatus provided by the present disclosure are described in detail above. Any obvious change made to the present disclosure without departing from the essence of the present disclosure will constitute infringement of the patent right of the present disclosure, and those of ordinary skill in the art will bear corresponding legal responsibilities.