BOOK SCANNING USING MACHINE-TRAINED MODEL
20220207668 · 2022-06-30
Inventors
Cpc classification
H04N1/387
ELECTRICITY
G06V30/1914
PHYSICS
G06V10/247
PHYSICS
International classification
Abstract
This application discloses a technology for flattening a photographed page of a book and straightening texts therein. The technology uses one or more mathematical models to represent a curved shape of the photographed page with certain parameters. The technology also uses one or more photographic image processing techniques to dewarp the photographed page using the parameters of the curved shape. The technology uses one or more additional parameters that represent certain features of the photographed page to dewarp the photographed page.
Claims
1-19. (canceled)
20. A method of preparing an input-output data pair for training a machine-trainable model, the method comprising: providing a markings page as part of a bound book, wherein the markings page comprises a plurality of predetermined markings that are added to the bound book for training the machine-trainable model; providing, for at least part of the plurality of predetermined markings, a 2-dimensional (2D) location of each predetermined marking on the markings page at an unbound state thereof; capturing, using a camera, a photographic image of the markings page in an open state thereof in the bound book in which the markings page is curled from the unbound state, wherein the markings page appearing on the photographic image is distorted from the unbound state at least due to curling of the markings page in the open state at the time of capturing the photographic image; processing the photographic image to obtain distorted 2D locations of at least part of the plurality of predetermined markings on the photographic image; and computing a set of parameters representing distortion of the markings page on the photographic image, wherein the set of parameters comprises a first subset of parameters that relates to curling of the markings page in the open state from the unbound state, wherein computing the first subset of parameters involves a process of iteration comprising: assigning values to the first subset of parameters, simulating, using the assigned values to the first subset of parameters, warping of the markings page from the unbound state and/or unwarping of the markings page from the photographic image, determining if at least one of the simulated warping and the simulated unwarping using the assigned values corresponds to an amount of curling of the markings page, when determined that at least one of the simulated warping and the simulated unwarping does not corresponds to the amount of curling of the markings page, assigning updated values to the first subset of parameters such that at least one of the simulated warping and the simulated unwarping using the updated values would likely be more corresponding to the amount of curling than the at least one of the previously simulated warping and the previously simulated unwarping, subsequently determining that at least one of the simulated warping and the simulated unwarping using further updated values corresponds to the amount of curling of the markings page, and associating the further updated values as the first subset of parameters with the photographic image of the markings page or a modified version of the photographic image such that the photographic image of the markings page or the modified version is part of an input of the input-output data pair and the further updated values are part of an output of the input-output data pair.
21. The method of claim 20, wherein the process of iteration comprises: assigning the values to the first subset of parameters, simulating, using the assigned values to the first subset of parameters, warping of the markings page from the unbound state, determining if the simulated warping using the assigned values conforms to the curling of the markings page in the open state from the unbound state, when determined that the simulated warping does not conform to the curling of the markings page, assigning updated values to the first subset of parameters such that the simulated warping using the updated values would likely be more conforming to the curling than the previously simulated warping, subsequently determining that the simulated warping using further updated values conform to the curling of the markings page, and associating the further updated values as the first subset of parameters with the photographic image of the markings page or a modified version of the photographic image such that the photographic image of the markings page or the modified version is part of an input of the input-output data pair and the further updated values are part of an output of the input-output data pair.
22. The method of claim 20, wherein the process of iteration comprises: assigning the values to the first subset of parameters, simulating, using the assigned values to the first subset of parameters, unwarping of the markings page from the photographic image, determining if the simulated unwarping using the assigned values conforms to the markings page at the unbound state, when determined that the simulated unwarping does not conform to the markings page at the unbound state, assigning updated values to the first subset of parameters such that the simulated unwarping using the updated values would likely be more conforming to the marking page at the unbound state than the previously simulated unwarping, subsequently determining that the simulated unwarping using further updated values conform to the markings page at the unbound state, and associating the further updated values as the first subset of parameters with the photographic image of the markings page or a modified version of the photographic image such that the photographic image of the markings page or the modified version is part of an input of the input-output data pair and the further updated values are part of an output of the input-output data pair.
23. The method of claim 20, wherein the markings page appearing on the photographic image is distorted from the unbound state due to a 3-dimensional (3D) camera location relative to the markings page in addition to the curling of the markings page in the open state at the time of capturing the photographic image, wherein the set of parameters further comprises a second subset of parameters relating to the 3D camera location relative to the markings page at the time of capturing the photographic image.
24. The method of claim 20, wherein the markings page appearing on the photographic image is distorted from the unbound state due to a 3D camera orientation relative to the markings page at the time of capturing the photographic image in addition to the curling of the markings page in the open state at the time of capturing the photographic image, wherein the set of parameters further comprises a third subset of parameters relating to the 3D camera orientation of the camera relative to the markings page at the time of capturing the photographic image.
25. The method of claim 20, wherein the markings page appearing on the photographic image is distorted from the unbound state due to a 3-dimensional (3D) camera location relative to the markings page and due to a 3D camera orientation relative to the markings page at the time of capturing the photographic image in addition to the curling of the markings page in the open state at the time of capturing the photographic image, wherein the set of parameters further comprises a second subset of parameters relating to the 3D camera location relative to the markings page at the time of capturing the photographic image and a third subset of parameters relating to the 3D camera orientation of the camera relative to the markings page at the time of capturing the photographic image.
26. The method of claim 20, wherein the markings page at the unbound state is flat or substantially flat.
27. The method of claim 20, wherein the first subset of parameters includes two parameters representing a Bezier Curve.
28. The method of claim 20, wherein the set of parameters further comprises at least one page size parameter representing a size of the markings page in a flattened image that would be obtained by unwarping the photographic image of the markings page or a modified version of photographic image.
29. The method of claim 28, wherein the at least one page size parameter represents a width of the markings page in the flattened image relative to a width of the flattened image.
30. The method of claim 20, wherein, in computing the distorted 2D location for each of the plurality of predetermined markings on the distorted image, the distorted image corresponds to a photographic image of the markings page taken by a pinhole camera located at the 3D camera location relative to the markings page according to assigned value(s), wherein the pinhole camera has an intrinsic parameter matrix of
31. The method of claim 30, wherein at least one of the first subset of parameters defining the 3D camera location represents the pinhole camera's translation along an optical axis of the pinhole camera relative to the markings page, and further represents the focal length f of the pinhole camera such that the model does not provide a separate parameter representing the focal length f other than the set of parameters.
32. A method of preparing a machine-trained model, the method comprising: generating a plurality of input-output data pairs according to the method of claim 20; and training a machine-trainable model using the plurality of input-output data pairs to provide a machine-trained model such that the machine-trained model is configured to generate values for the set of parameters in response to an input of an image of an opened book page.
33. A non-transitory storage medium storing a plurality of instructions executable by a computer, wherein the plurality of instructions, when executed, causes the computer to generate a plurality of input-output data pairs according to the method of claim 32.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
DETAILED DESCRIPTION
[0069] Hereinafter, implementations of the present invention will be described with reference to the drawings. These implementations are provided for better understanding of the present invention, and the present invention is not limited only to the implementations. Changes and modifications apparent from the implementations still fall in the scope of the present invention. Meanwhile, the original claims constitute part of the detailed description of this application.
Need for Flattening an Opened Book Page
[0070] Sometimes book readers want to digitally store images of physical books. One way is to photograph individual book pages. When photographing pages of an opened book, photographed pages are often curved and texts are distorted. There are many mobile applications for photographing documents and digitally modifying the photographs. However, many such mobile applications do not effectively address distortion of texts on the photographed pages.
Flattening Opened Book Page
[0071] This application discloses a technology for flattening a photographed page of a book and straightening texts thereon. The technology uses one or more mathematical models to represent a curved shape of the photographed page. The technology also uses one or more photographic image processing techniques to flatten or dewarp the photographed page using certain parameters related to the curved shape.
Use of Artificial Intelligence
[0072] The technology uses one or more machine-trained models to obtain parameters for use in a dewarping or flattening process of the photographed page. A machine-trained model of the technology is configured to, in response to an input of data of a photographic image, output parameters for use in a dewarping or flattening process of the photographic image.
Data Set for Training Machine-Trainable Model
[0073] To prepare the machine-trained model, the technology first develops and prepares a data set for training of a machine-trainable model. The training data set includes a number of data pairs. Each pair includes input data for the training machine-trainable model and desirable output data (label) from the model in response to the input data. For example, the input data is an image of a curved book page, and the desirable output data includes one or more parameters for use in obtaining a flattened image featuring a flat version of the curved book page.
Training of Machine-Trainable Model
[0074] The technology can use various training techniques to obtain a machine-trained model having a desirable performance. For example, training of a model is completed when, for each of input data of the training data set, output from the model is within a predetermined allowable range of error from the corresponding desirable output data (label) of the training data set.
Image Flattening using Mobile Application
[0075] Once the machine-trained model is prepared, the machine-trained model is included in a computer program, e.g., a mobile application for a smartphone. When a user takes a photograph of a page of an opened book, the mobile application uses the machine-trained model to obtain a set of image correction parameters, and processes the photograph to generate a flattened image of the photographed page using the set of image correction parameters. The flattened image features straightened texts of the photographed page.
Process to Obtain Flattened Book Page Images
[0076]
Acquiring Photograph of Opened Book Page
[0077] Referring to
Texts in Photograph
[0078] The target page 110 illustrates text lines 114 which are not actual lines but represent arrangements of texts. The photograph 120 is large and clear enough such that characters are legible in the photograph 120. For example, the photograph 120 is a color photograph having a resolution of 4096×3072 and 8-bit of color depth for each of red, green and blue (RGB) channels. In embodiments, the photograph 120 may have one or more specifications different from the example size or the example color depth.
Distortions in Photograph
[0079] Typically, before being bound to the book, the target page 110 has a rectangular shape, and texts are aligned along straight, parallel, invisible lines on the page. However, when the book 100 is open, the target page 110 may be curved (curled or arched) depending upon its binding. Accordingly, the target page 122 in the captured photograph 120 may be distorted from its original flat rectangular shape as illustrated. The arrangement of the texts, i.e., text lines 124 in the photograph 120 are curved accordingly.
Parameters to Define Distortions
[0080] The distortions in the photograph 120 may be defined by various distortion parameters. For example, one may define the page distortions in the photograph using (1) physical bending or warping of the target page 110 due to the book's binding, (2) the camera's position and orientation relative to the target page 110 when the photograph was taken, and (3) the camera's optical characteristic (for example, lens aberrations). One or more additional parameters may contribute to the page's distortion in the photograph 120.
Image Correction Parameters
[0081] The photograph 120 may be flattened using the distortion parameters to generate the flattened image 130. As the page's distortion is corrected, the flattened image 130 features a flattened version of the page 132 (flattened page) and texts are aligned along straight lines 134 in the flattened page. In an implementation, one or more parameters that are not directly related to or contributed to the page's distortion can be used for the image flattening process. How to define and obtain image correction parameters is discussed later in more detail.
Possible Direct Measurement of Image Correction Parameters
[0082] For example, the smartphone may use the camera's focusing mechanism to measure the camera's distance to a point of the target page 110. If the smartphone 200 has a 3D scanning system separate from the camera 210, it may directly measure the page's curved shape and obtain one or more image correction parameters representing the page's curved shape. As such, the smartphone 200 may use one or more sensors to obtain an image correction parameter directly without referencing to the photograph 120. However, in an implementation, the smartphone 200 cannot or does not directly measure one or more image correction parameters.
Indirect Acquisition of Image Correction Parameters from Photograph
[0083] When the smartphone is not capable of directly measuring one or more image correction parameters, the smartphone 200 obtains the one or more image correction parameters indirectly from processing of the photograph 120. For example, (1) an iterative estimation and (2) a machine-trained model can be used to obtain one or more parameters from the photograph 120. In the alternative, one or more analysis techniques can be used to obtain an image correction parameter from the photograph.
Iterative Estimation May Not Impractical for Smartphone
[0084] The smartphone 200 may obtain one or more image correction parameters from the photograph 120 using an iterative estimation. In such an iterative estimation, one or more image correction parameters can be determined by repeating (1) evaluating a set of estimated parameters using one or more predetermined criteria and (2) update one or more in the set of estimated parameters based on the evaluation, until the one or more predetermined criteria are satisfied. For example, the smartphone 200 (a) generates a corrected version of the photograph 120 using a set of estimated image correction parameters, (b) evaluates if texts are aligned along straight lines in the corrected version, and (c) updates at least one of the set of estimated image correction parameters based on the evaluation until and repeating the generation step (a) and the evaluation step (b) until finding a set image correction parameters that makes text lines straight in the corrected version. However, performing such an iterative estimation on the smartphone 200 may not be desirable when it takes a long time (e.g. more than 1 second) to reach a final estimation due to the smartphone's limited computational power and when time to reach a final estimation varies significantly among different photographs.
Machine-Trained Model
[0085] The smartphone 200 may run a machine-trained model 320 to obtain one or more image correction parameters from the photograph 120. Referring to
Input Image of Machine-Trained Model
[0086] In an implementation, for example, the input image 310 has a resolution of 192×144 and three color channels of red, green and blue (RGB) while the photograph has a resolution of 3200×2400 and has RGB) channels. As such, the number of pixels in the input image 310 can be less than 1 percent of the number of pixels in the photograph. Using a smaller resolution for the input image 310 can be advantageous to reduce the numbers of internal parameters of the machine-trained model 320 and thereby to reduce an amount of computation for obtaining the image correction parameters.
[0087] As the machine-trained model 320 requires the input image 310 to satisfy a predetermined specification (the same specification as input images used for training the model), the photograph 120 is processed into the input image 310. The predetermined specification for the input image 310 may be different from the example, and may define one or more of pixel resolution, image format, and color channel.
Illegible Text in Small Input Image
[0088] When the photograph 120 is reduced to the input image 310 having, for example, a resolution of 192×144, characters printed on the target page 110 may not be legible or individually recognizable in the input image 310. For example, a legible character having a size of 30×30 pixels in the photograph 120 (having a resolution of 3200×2400, for example) becomes illegible to a human eye in the input image 310 when the character gets smaller than a minimum legible size (for example, 3×5 pixels) in the input image 310 having a resolution of 192×144.
[0089] However, the machine-trained model 320 does not need legible characters to output image correction parameters. Regardless of whether texts are legible in the input image 310, the machine-trained model 320 outputs image correction parameters good enough to correct the page's distortion in the photograph 120 and to obtain a rectangular page 132 (flattened page) of the flattened image 130.
Output of Machine-Trained Model
[0090] Referring to
[0091] Example output parameters of the machine-trained model 320 are described in more detail with reference to
Image Correction Parameter—Page Bending Parameter
[0092] In an implementation, the machine-trained model 320 outputs one or more parameters representing the page's bending (the page's curvature). To describe the page's bending with a limited number of parameters, a mathematical model is used in combination with one or more assumptions.
Parameter for Conversion Between Flat Surface and Curvature
[0093] In an implementation, the machine-trained model processes an input image data corresponding to the captured photograph 120 and outputs one or more parameters for relating between a flat surface (flat page) and a curvature corresponding to a bowed book page 122. The one or more parameters are to convert the flat surface to the curvature or to convert the curvature to the flat surface.
Curved Page Fits Cylindrical Surface
[0094] In an implementation, the page 110 is assumed to be a rectangular page when flat. It is also assumed that the page 110 curls from its flat rectangular shape to fit a cylindrical surface shown in
Coordinate System to Describe Page Curvature
[0095] Referring to
Same Cross-Section of Cylindrical Surface
[0096] Referring to
Bezier Curve
[0097] In an implementation, the curved line 150 is modeled using a Bezier curve. A Bezier curve may be defined using coordinates of its control points. Referring to
Relative Scale to Page Width
[0098] In representing the curved line 150 with the coordinates of the four control points (O.sub.p, P.sub.1, P.sub.2, and E.sub.p), the coordinates can be in a relative scale to the page width W. In a relative scale to the page width W, the coordinates of the origin O.sub.p and the right-bottom corner E.sub.P are fixed as (0, 0) and (0, 1) respectively. Accordingly, to define the curved line 150, we need only the coordinates for the other control points (P.sub.1, P.sub.2). In the alternative, a different scale can be used for the coordinates of the control points.
Two Parameters for Bezier Curve
[0099] When we set x-coordinates of the points P.sub.1, P.sub.2 to ¼ and ¾ of the page width W, further to using a relative scale to the page width W, the Bezier curve line 150 can be represented using only two coordinate values (parameters), z-axis coordinates Z.sub.1 and Z.sub.2 of the two points P.sub.1, P.sub.2in a relative scale to the page width W. Referring to
Camera Parameters Affecting Page's Curved Shape in Photograph
[0100] The photographed page's curved shape in the photograph 120 is affected by one or more parameters of the camera 210. The camera parameters include one or more intrinsic parameters (focal length, skew, offset) and one or more extrinsic parameters (camera rotation, camera translation). In an implementation, the machine-trained model 320 outputs one or more of the camera parameters for use in the image correction of the photograph 120. Example camera parameters are explained in detail with reference to
Coordinate System
[0101]
Pinhole Camera Model
[0102]
Ideal Pinhole Camera Model
[0103] A matrix of camera intrinsic parameters is
[0104] where f.sub.x and f.sub.y are focal lengths in pixel units, s is a skew parameter (skew coefficient) representing distortion of non-rectangular pixels, and x.sub.o and y.sub.o are offset parameters representing translations of the origin of imaging pixels relative to the pinhole.
[0105] In an ideal pinhole camera model of
Camera's Orientation Parameter (Camera Rotation)
[0106] The machine-trained model 320 outputs one or more parameters representing the camera's orientation relative to the target page 110. In an implementation, the camera's orientation can be represented using three angular parameters—roll, yaw and pitch of the camera 210 in the page coordinate system 410. In an implementation, the camera's orientation relative to the page 110 can be defined using three angles between axes of the page coordinate system 410 and the camera coordinate system 610. A first angle between the x-axis and the x.sub.c-axis, a second angle between the y-axis and the y.sub.c-axis, and a third angle the z-axis and the z.sub.c-axis in combination represent the camera's orientation relative to the page. In the alternative, the camera's orientation can be defined in a way different from the example.
Camera's Position Parameter (Camera Translation)
[0107] The machine-trained model 320 outputs one or more parameters representing the camera's position relative to the target page 110. In implementations, the machine-trained model 320 outputs one or more of (1) an x-axis translation t.sub.x, (2) a y-axis translation t.sub.y, and (3) a z-axis translation t.sub.z of the camera 210 in the page coordinate system 410. In embodiments, the x-axis translation t.sub.x, the y-axis translation t.sub.y, and the z-axis translation t.sub.z are x, y, z coordinates of the camera coordinate origin O.sub.c in the page coordinate system 410. In implementations, the machine-trained model 320 outputs one or more parameters representing the page's position relative to the camera 200. For example, the machine-trained model 320 outputs one or more of (1) an x-axis translation, (2) a y-axis translation, and (3) a z-axis translation of a point of the target page 110 in the camera coordinate system 610.
Camera Position Parameters
[0108] In an implementation, the photograph's size on the camera's image plane (x.sub.i-y.sub.i plane) is considered to compute one or more of the camera position parameters from the x-axis translation (t.sub.x) and the y-axis translation (t.sub.y). For example, the machine-trained model 320 outputs an x-axis translation parameter (t.sub.x_scale) defined by the following equation:
[0109] where w.sub.i is the photograph's width in pixels and t.sub.x is the x-axis translation of the camera.
[0110] For example, the machine-trained model 320 outputs a y-axis translation parameter (t.sub.y_scale) defined by the following equation:
[0111] where h.sub.i is the photograph's height in pixels, and t.sub.y is the y-axis translation of the camera.
Camera Focal Length Parameter
[0112] In an implementation, the machine-trained model 320 outputs one or more of the camera's parameters. For example, the machine-trained model 320 outputs a focal length parameter (f.sub.scale) defined by the following equation:
[0113] where w.sub.i and h.sub.i are the photograph's width and height in pixel units.
Z-Axis Translation and Camera Focal Length Parameter Combined in a Single Parameter
[0114] According to an ideal pinhole camera model of
Relative Scale for Camera Parameter
[0115] In the examples discussed above, the x-axis translation parameter (t.sub.x_scale), the y-axis translation parameter (t.sub.y_scale) and the focal length parameter (f.sub.scale) are defined in a relative scale to the photograph's size on the image plane in pixels. Using relative scales for camera parameters is advantageous to accommodate various sizes of photographs for an image flattening process to obtain a flattened image and for training of a machine-trainable model. In the alternative, one or more of the camera parameters can be defined without considering the photograph's size in pixels.
Page Size Ratio
[0116] In embodiments, the machine-trained model 320 outputs one or more parameters representing a size of the flattened page 132 in the flattened image 130.
No Detection of Page Edge to Remove Background
[0117] In embodiments, the flattened page 132 is of a rectangular shape having its sides parallel to the sides of the flattened image 130, and the center of the rectangular flattened page 132 is located at the center of the flattened image 130. Accordingly, when we know the page width ratio and the page height ratio, the background 136 can be removed just by trimming the flattened image 130 based on the ratios without a process to detect an edge of the flattened page 132 in the flattened image 130.
Image Flattening Process
[0118] Using one or more of the obtained image correction parameters, an image flattening process of the photograph 120 (or its equivalent) is performed to generate the output image 140. The image flattening process is a computational process to reverse the page's distortion based on one or more mathematical models and assumptions used for defining the image correction parameters (e.g. pinhole camera model, assumption of a Bezier Curve, and assumption that the page width ratio Δu/u.sub.1 and the page height ratio Δv/v.sub.1 are the same). In embodiments, the output image 140 can be generated without actually generating or storing the flattened image 130. A person having ordinary skill in the art would be able to configure a computational process of image flattening based on mathematical models and assumptions to define the image correction parameters.
Process to Prepare Machine-Trained Model
[0119] A process to prepare the machine-trained model 320 is discussed in detail with reference to
Training Data Set for Supervised Learning
[0120]
Training Data Set Size
[0121] For example, 100,000 input-output data pairs can be used to prepare and configure the machine-trained model 320. In the alternative, input-output data pairs can be less than 100,000 or more than 100,000.
Process to Prepare Training Data Set (S910)
[0122]
Printing Predetermined Layout of Guide Markings on Book Page (S1110)
[0123]
[0124] In an implementation, guide markings are printed in a color different (e.g. red) from texts (e.g. black) in the page 1210. In the alternative, guide markings can be printed in a color same as the texts in the page 1210, and can be printed in two or more colors.
Separating and Rebinding Book to Print Guide Markings
[0125] For example, a book is separated into individual sheets before printing guide markings. Then, guide markings are printed at their respective predetermined position on the separated individual sheets. After printing guide markings, the individual sheets are re-bound to a book (S1130). In the alternative, guide markings are printed on pages of a book without separating pages from the book.
Determining Position of Each Printed Mark When Markings Page is Flat (S1120)
[0126] Subsequent to printing the guide markings, location of each mark on the markings page 1210 is determined. For example, coordinates of a mark M.sub.21 on the marked page 1210 is measured using one or more measurement instruments when the page is placed flat. In an implementation, measurement of mark coordinates is performed when the page 1210 is a separate sheet and prior to being bound to a book. In the alternative, coordinates of mark M.sub.21 can be determined using data of a printing process of the guide markings without performing a measurement.
Obtaining Photographs of Markings Page (S1140)
[0127] After printing guide markings on book pages, a photograph is obtained for each markings page when the markings page is open and curved.
[0128] In an implementation, two or more photographs are taken for a single page while moving a camera relative to the page or changing the page's level of curling. In doing so, two or more pairs of input image and output data can be produced for the same page.
Obtaining Markings Page Photograph From Video
[0129] To obtain a number of photographs of markings pages efficiently, for example, a video is taken while turning pages of the book (and moving the book), and photographs of the markings pages are generated using one or more frames of the video. In the alternative, photographs of the markings pages can be obtained in a way different from the example.
Generating Training Input Image (S1150)
[0130] In an implementation, the input image 1010 has a resolution of 192×144 while the markings page photograph 1220 has a resolution of 3840×2160 (4K) such that the number of pixels in the input image 1010 is less than 1 percent of the number of pixels in the page photograph 1220. The markings page photograph 1220 is converted to the input image 1010. In the alternative, the markings page photograph 1220 can be used as a training input image without further processing.
Removing Guide Markings to Generate Training Input Image
[0131] In an implementation, one or more guide markings are removed to generate the input image 1010 from the markings page photograph 1220. For example, guide markings are modified to have a color of the book page paper. Referring to
Additional Processing of Background to Generate Training Input Image
[0132] In an implementation, in generating the input image (input data) 1010 from the markings page photograph 1220, one or more features can be added, removed or modified. For example, a background of the markings page photograph 1220 (an area outside the curved page 1222) is modified using a predetermined color or pattern to distinguish the curved page 1222 further from the background.
Determining Position of Each Printed Mark in Markings Page Photograph (S1160)
[0133] In implementations, the photograph 1220 is analyzed to locate a center for each circular dot, and coordinates of the center are used as coordinates of the guide marking. In the alternative, a point other than the center can be used as a reference to determine coordinates of the guide marking on the x.sub.i-y.sub.i image plane.
Determining Image Correction Parameters Corresponding to Input Image (S1170)
[0134] The 5×5 array of the guide markings is distorted from the flat markings page 1210 to the markings page photograph 1220 in accordance with the page's distortion. In an embodiment, one or more image correction parameters of the output data 1020 are determined based on relation between a layout of the guide markings in the flat markings page 1210 and a layout of the guide markings in the markings page photograph 1220. An example process to obtain one or more image correction parameters is described with reference to
Iterative Process to Provide Output Data
[0135]
Generating Distorted Image Using Current Estimation of Parameters (S1310)
[0136] Referring to
[0137] Using a current set of estimated parameters, the virtual flat page image 1420 is distorted to obtain a distorted image (simulated camera image) 1430. Distortion of the virtual flat page image 1420 is performed using an image formation simulation that is based on mathematical models and assumptions used for defining the image correction parameters (e.g. pinhole camera model, assumption of a Bezier curve line).
Determining Position of Guide Markings on Distorted Image (S1320)
[0138] The distorted image 1430 is analyzed to obtain distorted location of the guide markings in the distorted image 1430. Referring to
Computing Positional Difference of Guide Marking (S1330)
[0139] In implementations, a positional difference between a guide marking on the photograph 1220 and a corresponding guide marking on the distorted marking image 1430 is computed based on their coordinates determined in the mark position determining processes S1320, S1160.
[0140] A positional difference is computed for each of the guide markings, and is used to determine whether the distorted image 1430 matches the markings page photograph 1220.
Computing Loss Representing Guide Marking Layout Difference (S1340)
[0141] A loss representing difference between the distorted image 1430 and the markings page photograph 1220 is computed. For example, a loss is computed based on the positional difference computed in the process S1330. For another example, a loss is computed based on difference between (1) a gap between two neighboring guide markings (e.g. g.sub.v2, g.sub.h2 shown in
Determining Matching Between Distorted Image and Markings Page Photograph (S1360)
[0142] The iterative process further includes determining whether the loss is less than a predetermined threshold and thereby determining whether layouts of the guide markings on the distorted image and the photograph of the markings page match. When the loss is less than the predetermined threshold, it can be determined that the distorted image 1430 matches the markings page photograph 1220. When the distorted image 1430 matches the markings page photograph 1220, it is determined that the current estimation of parameters explains the page's distortion, and it is expected that an image flattening processing of the photograph 1220 using the current estimation would generate an undistorted flat rectangular version of the book page.
[0143] When the loss is greater than the predetermined threshold, it can be determined that the current set of estimation parameters is not good enough to explain the page's distortion in the markings page photograph 1220.
Updating Estimated Parameters (S1360)
[0144] When loss is greater than the predetermined threshold, one or more of estimated parameters are updated. For example, an estimation value for a parameter is updated based on a partial derivative of the loss with respect to the parameter. A Newton-Raphson method can be used to update one or more parameters. In the alternative, one or more mathematical methods different from the example can be used to update estimation of parameters.
Associate Current Set of Estimated Parameters with Input Image (S1370)
[0145] When the computed loss is less than a predetermined threshold, the current set of estimated parameters are stored in association with the input image 1010 as the output data 1020.
Obtaining Image Correction Parameter From Corrected Image
[0146] Subsequent to determining one or more image correction parameters according to the process of
[0147] In embodiments, when the camera 200 is modeled using an ideal pinhole camera model, the focal length parameter (f.sub.scale) represents the camera's z-axis translation and an additional parameter representing the camera's z-axis translation would not be necessary as an image correction parameter (as an output of the model 320). Then, the flattened image is analyzed to obtain a page width ratio (Δu/u.sub.1) and a page height ratio (Δv/v.sub.1) explained with reference to
Input-Output Data Generation Using Simulation
[0148] In an implementation, a simulation process can be used to generate input-output data for training a machine-trainable model without printing guide markings on a book and taking a photograph of a markings page. In an example simulation process, a set of output parameters (output data for training) is determined first without referencing to an image featuring a curved book page. A corresponding input data is generated using the determined set of output parameters. The corresponding input data (an image featuring a curved book page, or its modified version) is generated using a process distorting an image of a flat book page (available from scanning of a flat page or virtually creating data of a flat book page) based on the determined set of output parameters (according to mathematical models and assumptions used for defining the image correction parameters, e.g. pinhole camera model, assumption of a Bezier curve line). The simulation process does not require an iteration process of
Training of Model Using—Supervised Learning
[0149] Once input-output data pairs are prepared, one or more supervised learning techniques are used to prepare the machine-trained model 320. In embodiments, any known learning technique can be applied to the training of the model 320 as long as the technique can configure the model 320 to output, in response to training input images, parameters that are within a predetermined allowable error range from desirable output parameters (labels) of the training input images.
Structure of Machine-Trained Model—Convolutional Neural Network
[0150] In an implementation, a convolutional neural network (CNN) is used to construct the machined trained model 320. In general, a convolutional neural network requires a smaller number of model parameters when compared to a fully connected neural network. In an implementation, a neural network other than CNN can be used for the machined trained model 320.
Processing by Smartphone and Remote Server
[0151] One or more processes of the present disclosure can be performed by the smartphone 200, by a remote server, or by the smartphone and the remote server in combination. For example, when the smartphone 200 does not have the machine-trained model 320 on its local data store, the smartphone 200 transmits the input image 310 to a remote server such that the remote server runs the machine-trained model 320. For another example, the process of
Example Architecture of User Computing System
[0152]
[0153] As illustrated, the computing device 1500 includes a processor 1510, a network interface 1520, a computer readable medium 1530, and an input/output device interface 1540, all of which may communicate with one another by way of a communication bus. The network interface 1520 may provide connectivity to one or more networks or computing systems. The processor 1510 may also communicate with memory 1550 and further provide output information for one or more output devices, such as a display (e.g., display 1541), speaker, etc., via the input/output device interface 1540. The input/output device interface 1540 may also accept input from one or more input devices, such as a camera 1542 (e.g., 3D depth camera), keyboard, mouse, digital pen, microphone, touch screen, gesture recognition system, voice recognition system, accelerometer, gyroscope, etc.
[0154] The memory 1550 may contain computer program instructions (grouped as modules in some implementations) that the processor 1510 executes in order to implement one or more aspects of the present disclosure. The memory 1550 may include RAM, ROM, and/or other persistent, auxiliary, or non-transitory computer-readable media.
[0155] The memory 1550 may store an operating system 1551 that provides computer program instructions for use by the processor 1510 in the general administration and operation of the computing device 1500. The memory 1550 may further include computer program instructions and other information for implementing one or more aspects of the present disclosure.
[0156] In one implementation, for example, the memory 1550 includes a user interface module 1552 that generates user interfaces (and/or instructions therefor) for display, for example, via a browser or application installed on the computing device 1500. In addition to and/or in combination with the user interface module 1552, the memory 1550 may include an image processing module 1553, a machine-training model 1554 that may be executed by the processor 1510. The operations and algorithms of the modules are described in greater detail above with reference to
[0157] Although a single processor, a single network interface, a single computer readable medium, a singer input/output device interface, a single memory, a single camera, and a single display are illustrated in the example of
Other Considerations
[0158] Logical blocks, modules or units described in connection with implementations disclosed herein can be implemented or performed by a computing device having at least one processor, at least one memory and at least one communication interface. The elements of a method, process, or algorithm described in connection with implementations disclosed herein can be embodied directly in hardware, in a software module executed by at least one processor, or in a combination of the two. Computer-executable instructions for implementing a method, process, or algorithm described in connection with implementations disclosed herein can be stored in a non-transitory computer readable storage medium.
[0159] Although the implementations of the inventions have been disclosed in the context of certain implementations and examples, it will be understood by those skilled in the art that the present inventions extend beyond the specifically disclosed implementations to other alternative implementations and/or uses of the inventions and obvious modifications and equivalents thereof. In addition, while a number of variations of the inventions have been shown and described in detail, other modifications, which are within the scope of the inventions, will be readily apparent to those of skill in the art based upon this disclosure. It is also contemplated that various combinations or sub-combinations of the specific features and aspects of the implementations may be made and still fall within one or more of the inventions. Accordingly, it should be understood that various features and aspects of the disclosed implementations can be combined with or substituted for one another in order to form varying modes of the disclosed inventions. Thus, it is intended that the scope of the present inventions herein disclosed should not be limited by the particular disclosed implementations described above, and that various changes in form and details may be made without departing from the spirit and scope of the present disclosure as set forth in the following claims.