Method and apparatus employing font size determination for resolution-independent rendered text for electronic documents

12586397 ยท 2026-03-24

Assignee

Inventors

Cpc classification

International classification

Abstract

Method and apparatus for determining font point size in bitmapped text does not rely on accuracy of an optical character recognition (OCR) engine, or on generation of heuristics (e.g. assumption of certain amounts of different types of text, such as capital, lowercase, ascending, descending) to determine a likely font size. A deep learning model for determining text size is based on extraction of features from existing text to obtain a more general solution.

Claims

1. A method comprising: a) responsive to receipt of a bitmap of a text line, performing a vertical histogram of pixels in the text line; b) scaling the bitmap of the text line to create an image; c) generating a horizontal histogram of the image to create an original feature vector; d) applying a transform to the image to create a further image; e) generating a horizontal histogram of the further image to create a further feature vector; f) concatenating the original feature vector with the further feature vector to create an output feature vector which is input to a deep learning model; g) applying the output feature vector to obtain a normalized point size; h) responsive to document parameters, determining a target resolution; i) responsive to h), determining a scaling factor; and j) responsive to i), determining a target point size.

2. The method of claim 1, further comprising removing, from the vertical histogram, lines with zero pixel values.

3. The method of claim 1, further comprising repeating d) and e) to generate a plurality of further feature vectors, and wherein f) comprises concatenating all of the further feature vectors with the original feature vector to create the output feature vector.

4. The method of claim 1, wherein the target point size is calculated as follows: S T = S N * ( R G R T ) SF where S.sub.T is the target point size; S.sub.N is the normalized point size; R.sub.G is a ground truth resolution; R.sub.T is the target resolution; and SF is the scale factor.

5. The method of claim 1, wherein the transform comprises a Gabor filter.

6. The method of claim 3, wherein repeating d) comprises applying a plurality of Gabor filters to generate the plurality of further feature vectors.

7. The method of claim 1, wherein the deep learning model comprises a model selected from the group consisting of convolutional neural networks (CNN), deep convolutional neural networks (DCNN), and Gabor Convolutional Networks (GCN).

8. The method of claim 1, further comprising training the deep learning model using bitmap text of known point size.

9. The method of claim 8, further comprising rendering the bitmap text at a ground truth resolution and scaling the bitmap text to a target scale height.

10. The method of claim 9, further comprising scaling a known point size of the bitmap text based on a ratio of a height of a bounding box for a line of the bitmap text to the target scale height.

11. An apparatus comprising at least one processor and at least one non-transitory memory storing instructions which, when executed by the at least one processor, perform a method comprising: a) responsive to receipt of a bitmap of a text line, performing a vertical histogram of pixels in the text line; b) scaling the bitmap of the text line to create an image; c) generating a horizontal histogram of the image to create an original feature vector; d) applying a transform to the image to create a further image; e) generating a horizontal histogram of the further image to create a further feature vector; f) concatenating the original feature vector with the further feature vector to create an output feature vector which is input to a deep learning model; g) applying the output feature vector to obtain a normalized point size; h) responsive to document parameters, determining a target resolution; i) responsive to h), determining a scaling factor; and j) responsive to i), determining a target point size.

12. The apparatus of claim 11, wherein the method further comprises removing from the vertical histogram, lines with zero pixel values.

13. The apparatus of claim 11, wherein the method further comprises repeating d) and e) to generate a plurality of further feature vectors, and wherein f) comprises concatenating all of the further feature vectors with the original feature vector to create the output feature vector.

14. The apparatus of claim 11, wherein the target point size is calculated as follows: S T = S N * ( R G R T ) SF where S.sub.T is the target point size; S.sub.N is the normalized point size; R.sub.G is a ground truth resolution; R.sub.T is the target resolution; and SF is the scale factor.

15. The apparatus of claim 11, wherein the transform comprises a Gabor filter.

16. The apparatus of claim 11, wherein repeating d) comprises applying a plurality of Gabor filters to generate the plurality of further feature vectors.

17. The apparatus of claim 11, wherein the deep learning model comprises a model selected from the group consisting of convolutional neural networks (CNN), deep convolutional neural networks (DCNN), and Gabor Convolutional Networks (GCN).

18. The apparatus of claim 11, wherein the method further comprises training the deep learning model using bitmap text of known point size.

19. The apparatus of claim 18, wherein the method further comprises rendering the bitmap text at a ground truth resolution and scaling the bitmap text to a target scale height.

20. The apparatus of claim 19, wherein the method further comprises scaling a known point size of the bitmap text based on a ratio of a height of a bounding box for a line of the bitmap text to the target scale height.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Aspects of the present invention now will be described in detail with reference to embodiments, as depicted in the accompanying drawings, in which:

(2) FIGS. 1A-1E show examples of letters, words, and diacritical marks to exemplify issues in point size determination;

(3) FIG. 2 is a high level diagram of process flow according to an embodiment;

(4) FIG. 3 is a high level block diagram according to an embodiment;

(5) FIG. 4 is a high level block diagram of portions of FIG. 3 according to an embodiment; and

(6) FIG. 5 is a high level block diagram of portions of FIG. 3 according to an embodiment.

DETAILED DESCRIPTION

(7) In aspects of the present invention, efficiency and/or performance is not affected by the need to decipher different type of lines of text of different characters, e.g. all upper case letters versus all lower case letters versus all letters with descenders or all letters with ascenders, combinations of any or all of the foregoing, etc.

(8) Embodiments of the invention employ machine learning (ML) systems which are trained on large sets of bitmapped text, all or substantially all of this bitmapped text being of known point size. The training data may be rendered at a specific resolution R.sub.G (ground truth), with text scaled to a target scale height H.sub.N. In one aspect, known point size for each training input may be scaled based on a ratio of a rendered text bounding box height H.sub.0 to the target scale height H.sub.N. With this approach, model outputs may be scaled to this target scale height, and thus can be used independently from a target resolution. In embodiments, automatically-generated data with the indicated specific resolution and scaled text height may be used to train the system and thus promote identification of font size in processed bitmapped text. Such identification facilitates generation of electronic documents in word processing software such as Microsoft's Office 365, Google's G Suite, Apache OpenOffice, LibreOffice, and the like.

(9) One desired effect of aspects of the invention is to provide consistent font size in electronic documents. For example, font size may be tailored for an electronic document on standard letter sized paper (8.511), with standard margins (1 all around). 12 point fonts may be considered to be most common. Accordingly, a desired target point size to achieve the desired font size may be desirable.

(10) In embodiments, the automatically-generated data to train an ML system may come from various types of transforms of bitmapped text. Gabor filters are one example of a class of such transforms. Various parameters in Gabor filters may be altered to yield different transforms. Altering one or more of these parameters may facilitate automatic generation of training data for the ML system.

(11) One ML system category is known as Gabor Convolutional Networks (GCN or GCNN), which may incorporate Gabor filters into deep convolutional neural networks (DCNN). In different aspects, Gabor filters may be employed with various deep learning architectures. As alluded to herein, scaling can be an issue in identifying point size for bitmapped text. GCNs are known to be able to cope with data in which scale can change frequently.

(12) With a system trained as just described, bitmapped text may be input to the system and processed to output a normalized point size S.sub.N. In embodiments, process flow may proceed as follows.

(13) Referring to FIG. 2, at 205 a bitmap of a line of text may be input into the system. Ordinarily skilled artisans will appreciate that this bitmapped text may be the result of different kinds of image processing and/or image analysis. In an embodiment, it may be assumed that the bitmapped text has a uniform point size. Text lines may have different lengths for different reasons, such as spaces between characters or words, numbers of characters or words, or the like. As one step in point size determination, at 210 a vertical histogram plot of pixels in a text line may be taken. Lower case and upper case characters, characters with ascenders, characters with descenders, characters with different accents and/or diacritical marks or punctuation, and the like may have different numbers of pixels in a given vertical line. Some vertical lines may be blank. In an embodiment, at 215 these blank vertical lines (whitespaces) may be removed, enabling more general operation of the model, and also making it possible to obtain the same results irrespective of presence of text spaces. There are instances in which different characters in different alphabets (for example, Chinese-Japanese-Korean, or CJK alphabet) may have different sizes of spaces between characters. In an embodiment, scaling also reduces an impact of shorter text segments as compared with longer text segments. This impact may be understood as treating characters in short segments as though they were repeated characters in longer segments.

(14) At 220, normalized dimensions for text width and height (e.g. W.sub.N, H.sub.N) may be selected for text line analysis. The particular values are not critical to implementation of the invention. Merely by way of example, values for W.sub.N and H.sub.N may be in powers of 2, with W.sub.N being in turn a multiple of H.sub.N. For example, W.sub.N could be 256 or 512, and H.sub.N could be 32 or 64. Other values, and other numerical relationships between W.sub.N and H.sub.N may be possible, depending on the embodiment.

(15) At 225, after selection of normalized text width and height dimensions, scaling of the bitmap, perhaps modified as discussed with respect to earlier-described actions, to dimensions such as W.sub.NH.sub.N may be effected, to create a bitmapped image M.sub.0. At 230, this bitmapped image M.sub.0 in turn may have a horizontal histogram generated for it, to create a feature vector F.sub.0. At 235-255, a loop may be provided in which, in different loop iterations, different transforms may be applied to feature vector F.sub.0 to create different images M, for which in turn respective horizontal histograms may be generated to create different feature vectors F. In 235-255, there are N such transforms which may be applied, so that there may be N iterations of the loop, resulting in creation of different images M.sub.1-M.sub.N, and different generated horizontal histograms and resulting feature vectors F.sub.1-F.sub.N.

(16) In an embodiment, the transforms may include Gabor filters or other types of bandpass filters to process bidimensional signals such as bitmaps. Ordinarily skilled artisans will appreciate that Gabor filters are special types of short-time Fourier transforms. In an embodiment, a feature vector may be generated by convolving a bitmapped image with a Gabor filter. Ordinarily skilled artisans also will appreciate that Gabor filters have a number of parameters, variations of which yield different filters, and hence different transforms which may form some or all of the N transforms discussed with respect to 235-255 in FIG. 2.

(17) An example of a Gabor filter is as follows:

(18) Complex : g ( x , y ; , , , , ) = exp ( - x 2 + 2 y 2 2 2 ) exp ( i ( 2 x + ) ) Real : g ( x , y ; , , , , ) = exp ( - x 2 + 2 y 2 2 2 ) cos ( i ( 2 x + ) ) Imaginary : g ( x , y ; , , , , ) = exp ( - x 2 + 2 y 2 2 2 ) sin ( i ( 2 x + ) ) where x=x cos +y sin and y=x sin +y cos and where the following parameters control the shape and size of the Gabor function: =wavelength of the sinusoidal component; =orientation of the normal to the parallel stripes of the Gabor function; =phase offset of the sinusoidal function; =sigma/standard deviation of the Gaussian envelope; =spatial aspect ratio and specifies the ellipticity of the support of the Gabor function.

(19) Varying one or more of these parameters will yield different values of the Gabor function, and therefore different transforms.

(20) In embodiments, the Gabor filter(s) and other transforms usefully may be employed in a neural network such as a convolutional neural network (CNN).

(21) After all of the feature vectors have been generated, at 260 all of the feature vectors F.sub.0-F.sub.N may be concatenated (linked together) to produce a feature vector F.sub.A. At 265, this feature vector F.sub.A may be input to the model, so that at 270, a normalized point size S.sub.N may be output.

(22) At 275, it being noted that the model was trained to have a resolution R.sub.G (ground truth resolution), a target resolution R.sub.T for the bitmapped text may be calculated, based on dimensions of the target document as was mentioned earlier, and also on the pixel dimensions of the input image. The resolution R.sub.T may be different for different document dimensions, such as letter size, legal size, tabloid size, A4 size, and the like. The resolution R.sub.T also may be different for different document margins, and/or for different combinations of document dimensions and document margins. At 280, from the target resolution R.sub.T, a scale factor S.sub.F may be calculated. In an embodiment, a ratio of text line bounding box height H.sub.0 to normalized height H.sub.N may determine the scale factor S.sub.F.

(23) At 285, then, using the scale factor S.sub.F, in an embodiment a target point size S.sub.T may be computed using the normalized point size S.sub.N and a ratio of the resolution R.sub.G of the model to the target resolution R.sub.T, as follows:

(24) S T = S N * ( R G R T ) SF

(25) In an embodiment, this target point size S.sub.T may be used in the electronic document. Depending on the electronic document and on the margins in the electronic document, the target point size may be rounded up or down, to a whole point size or to a partial point size. The rounding may be appropriate or necessary because of point size deviations resulting from scaling operations performed both in training and in inference. In an embodiment, such rounding may help to keep the output in the electronic document more consistent.

(26) Finally, at 290, after computing the target point size S.sub.T, that target point size may be output, and may be used in the electronic document or documents.

(27) FIG. 3 is a high level block diagram of a computing system 300 which may implement a deep learning system 320, trained on known data to provide font point size based on a ground truth resolution R.sub.G, among other things. Depending on the embodiment, bitmapped text input 310 may comprise a library of bitmapped text strings of different known lengths and different known characteristics. In an embodiment, bitmapped text input 310 may come from any number of sources, including not only live sources such as scanners, cameras, or other imaging equipment which can provide bitmapped images of known text sequences, but also canned sources such as libraries.

(28) Processing system 350 may be a separate system, or it may be part of bitmapped text input 310, or may be part of deep learning system 320, depending on the embodiment. Processing system 350 may include one or more processors, one or more storage devices, and one or more solid-state memory systems (which are different from the storage devices, and which may include both non-transitory and transitory memory).

(29) Depending on the embodiment, processing system 350 may include deep learning system 320 or may work with deep learning system 320 In other embodiments, any one or more of blocks 331-334 may implement its own deep learning system 320. In embodiments, each of the blocks 331-334 may include one or more processors, one or more storage devices, and one or more solid-state memory systems (which are different from the storage devices, and which may include both non-transitory and transitory memory). In embodiments, additional storage 360 may be accessible to one or more of text height determination block 331 or bounding box generation block 332 and processing system 350 over a communications network 340, which may be a wired or a wireless network or, in an embodiment, the cloud.

(30) In an embodiment, storage 360 may contain training data for the one or more deep learning systems in one or more of blocks 320, 331-334, or 350. Storage 360 may store bitmapped text from input 310.

(31) Where communications network 340 is a cloud system for communication, one or more portions of computing system 300 may be remote from other portions. In an embodiment, even where the various elements are co-located, network 340 may be cloud-based.

(32) FIG. 4 is a high level diagram of apparatus for weighting of nodes in a deep learning system according to an embodiment. As training of a deep learning system proceeds according to an embodiment, the various node layers 420-1, . . . , 420-N may communicate with node weighting module 410, which calculates weights for the various nodes, and with database 450, which stores weights and data. As node weighting module 410 calculates updated weights, these may be stored in database 450.

(33) FIG. 5 is a high level diagram of apparatus to operate a deep learning system according to an embodiment. In FIG. 5, one or more CPUs 510 may communicate with CPU memory 520 and non-volatile storage 550. One or more GPUs 530 may communicate with GPU memory 540 and non-volatile storage 550. Generally speaking, a CPU may be understood to have a certain number of cores, each with a certain capability and capacity. A GPU may be understood to have a larger number of cores, in many cases a substantially larger number of cores than a CPU. In an embodiment, each of the GPU cores may have a lower capability and capacity than that of the CPU cores, but may perform specialized functions in the deep learning system, enabling the system to operate more quickly than if CPU cores were being used.

(34) Depending on the embodiment, one or more of deep learning system 320, vertical histogram generation block 331, feature vector generation block 332, or bounding box generation block 333, or target point size generation block 334, or processing system 350 may employ some or all of the apparatus shown in FIG. 5.

(35) While the foregoing describes embodiments according to aspects of the invention, the invention is not to be considered as limited to those embodiments or aspects. Ordinarily skilled artisans will appreciate variants of the invention within the scope and spirit of the appended claims.