REALTIME OBJECT MEASUREMENT
20180025246 ยท 2018-01-25
Inventors
Cpc classification
G06V30/15
PHYSICS
G06V30/224
PHYSICS
International classification
Abstract
A system and process of nearsighted (myopia) camera object detection involves detecting the objects through edge detection and outlining or thickening them with a heavy border. Thickening may include making the object bold in the case of text characters. The bold characters are then much more apparent and heavier weighted than the background. Thresholding operations are then applied (usually multiple times) to the grayscale image to remove all but the darkest foreground objects in the background resulting in a nearsighted (myopic) image. Additional processes may be applied to the nearsighted image, such as morphological closing, contour tracing and bounding of the objects or characters. The bound objects or characters can then be averaged to provide repositioning feedback for the camera user. Processed images can then be captured and subjected to OCR to extract relevant information from the image.
Claims
1. A method of communicating adjustments for relative positioning of a handheld electronic device with a camera, and a document, the method comprising: continuously receiving, at a processor of the handheld electronic device, a plurality of images of characters on the document; dynamically detecting edges of the characters while continuously receiving the images; thickening the edges of the characters; thresholding the edges of the characters; displaying, using a display of the handheld camera, the edges of the characters; determining an average font height of the characters using the edges of the characters; and determining relative positioning information about positioning of the handheld electronic device relative to the document.
2. The method of claim 1, further comprising displaying the relative positioning information.
3. The method of claim 1, further comprising determining a focal length range of the camera.
4. The method of claim 1, wherein dynamically detecting edges of the characters includes estimating a gradient of characters in the images.
5. The method of claim 1, wherein dynamically detecting and thickening edges of the characters include using a Sobel operator.
6. The method of claim 1, wherein thickening the edges includes calculating a magnitude of a gradient of the detected edges of the characters.
7. The method of claim 1, wherein thresholding includes using an assumption of a foreground and background in the images.
8. The method of claim 1, wherein thresholding includes removing grayscale from a background of the source image.
9. The method of claim 1, wherein thresholding is repeated on at least one of the images until a nearsighted image is generated.
10. The method of claim 1, further comprising morphologically closing characters after thresholding.
11. The method of claim 1, further comprising determining a contour of the characters.
12. A computer program product comprising computer-executable instructions embedded on a non-transitory computer-readable medium, said computer-executable instructions for communicating adjustments for relative positioning of a handheld electronic device with a camera, and a document, the computer-executable instructions comprising: computer-executable instructions for dynamically detecting edges of characters on the document as a processor of the handheld electronic device continuously receives a plurality of images of the characters from the camera of the handheld electronic device; computer-executable instructions for thickening the edges of the characters; computer-executable instructions for thresholding the edges of the characters; computer-executable instructions for displaying, using a display of the handheld electronic device, the edges of the characters; computer-executable instructions for determining an average font height of the characters using the edges of the characters; and computer-executable instructions for determining relative positioning information about positioning of the handheld electronic device relative to the document.
13. The computer program product of claim 12, further comprising computer-executable instructions for displaying the relative positioning information on the display.
14. The computer program product of claim 12, wherein the computer-executable instructions for dynamically detecting and thickening edges of the characters include using a Sobel operator.
15. The computer program product of claim 12, wherein the computer-executable instructions for thickening the edges includes computer-executable instructions for calculating a magnitude of a gradient of the detected edges of the characters.
16. The computer program product of claim 12, wherein the computer-executable instructions for thresholding includes computer-executable instructions for using an assumption of a foreground and background in the images.
17. The computer program product of claim 12, wherein the computer-executable instructions for thresholding includes computer-executable instructions for removing grayscale from a background of the source image.
18. The computer program product of claim 12, wherein the computer-executable instructions for thresholding includes computer-executable instructions thresholding is repeated on at least one of the images until a nearsighted image is generated.
19. The computer program product of claim 12, further comprising computer-executable instructions for morphologically closing characters after thresholding.
20. The computer program product of claim 12, further comprising computer-executable instructions for determining a contour of the characters.
21. The computer program product of claim 12, wherein the computer-executable instructions for determining the contour includes computer-executable instructions for determining contour points.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
DETAILED DESCRIPTION OF THE INVENTION
[0041] The present invention now will be described more fully hereinafter with reference to specific embodiments of the invention. Indeed, the invention can be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. As used in the specification, and in the appended claims, the singular forms a, an, the, include plural referents unless the context clearly dictates otherwise. The term comprising and variations thereof as used herein is used synonymously with the term including and variations thereof and are open, non-limiting terms. Exemplary means an example of and is not intended to convey an indication of a preferred or ideal embodiment. Such as is not used in a restrictive sense, but for explanatory purposes.
[0042] As will be appreciated by one skilled in the art, the methods and systems may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the methods and systems may take the form of a computer program product on a computer-readable storage medium having computer-readable program instructions (e.g., computer software) embodied in the storage medium. More particularly, the present methods and systems may take the form of web-implemented computer software. Any suitable computer-readable storage medium may be utilized including hard disks, CD-ROMs, optical storage devices, or magnetic storage devices.
[0043] The methods and systems are described with reference to block diagrams and flowchart illustrations of methods, systems, apparatuses and computer program products. It will be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, respectively, can be implemented by computer program instructions. These computer program instructions may be loaded onto a handheld electronic device, a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create a means for implementing the functions specified in the flowchart block or blocks.
[0044] These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including computer-readable instructions for implementing the function specified in the flowchart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.
[0045] Accordingly, blocks of the block diagrams and flowchart illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, can be implemented by special purpose hardware-based computer systems that perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
[0046] Implementations of the present invention include a system and method for generating a myopic image that attenuates or eliminates background information and further processing the myopic image to create an OCR conditioned image that improves the likelihood of successful OCR processing. Generally, as shown in
[0047] As shown in
[0048] Some aspects of the present invention address this issue by providing (in a simplified description not necessarily capturing all possible permutations or complexities) a process for nearsighted or myopic capture of information that helps to exclude background objects. The nearsighted capture effectively blurs, attenuates and/or eliminates artefacts or other characters that are further away than the document of interestthus improving the accuracy of the OCR process.
[0049] Generally, the process of nearsighted (myopia) camera object detection involves detecting 16 the objects through edge detection and outlining or thickening 18 them with a heavy border. (Thickening may include making the object bold in the case of text characters.) The bold characters are then much more apparent and heavier weighted than the backgroundwhich tends to be grayscale or at least blurred being outside preferred focal lengths. Thresholding 20 operations are then applied (optionally, multiple times) to the grayscale image to remove all but the darkest foreground objects in the background resulting in a nearsighted (myopia) image.
[0050] Other aspects of systems and methods also facilitate improved image capture by providing feedback 66 to the consumer on the positioning 50 of the foreground document 24 within an acceptable focal length of the hand held electronic device 22. Generally, the system and method facilitate positioning continuously processing captured images, determining average character sizes of the indicia on those images and comparing them to expected font sizes. The handheld electronic device 22 then provides feedback 66 that can include visual cues (such as a slider bar and green or red status colors) on a display to guide the consumer in repositioning the camera relative to the document 24, haptic feedback, audible feedback, or combinations thereof.
[0051] As shown in
[0052] Despite the availability of other options, most implementations of the present invention are well suited for mobile electronic devices 22 including a camera 60 and generating source images 12 in the present. For example, the handheld electronic device 22 may be a phone with a camera capturing video (and multiple source images per second) of the foreground document 24.
[0053] As shown in
[0054] The convolution masks are represented by the following equations and/or pseudo-code:
TABLE-US-00001 intGX[3][3]; intGY[3][3]; /* 3x3 GX Sobel mask */ GX[0][0] = 1; GX[0][1] = 0; GX[0][2] = 1; GX[1][0] = 2; GX[1][1] = 0; GX[1][2] = 2; GX[2][0] = 1; GX[2][1] = 0; GX[2][2] = 1; /* 3x3 GY Sobel mask */ GY[0][0] = 1; GY[0][1] = 2; GY[0][2] = 1; GY[1][0] = 0; GY[1][1] = 0; GY[1][2] = 0; GY[2][0] = 1; GY[2][1] = 2; GY[2][2] = 1;
[0055] The Sobel operator also calculates the magnitude of the gradient:
|G|={square root over (Gx.sup.2+Gy.sup.2)}
[0056] Additional pseudo-code illustrates movement of the mask across the image, gradient approximation and other operations in full context.
TABLE-US-00002 sImage originalImage; // Input Image sImage edgeImage; ---------------------------------------------------*/ for(Y=0; Y<=(originalImage.rows1); Y++) { for(X=0; X<=(originalImage.cols1); X++) { long sumX = 0; long sumY = 0; /*-------X GRADIENT APPROXIMATION------*/ for(I=1; I<=1; I++) { for(J=1; J<=1; J++) { sumX = sumX + (int)( (*(originalImage.data + X + I + (Y + J)*originalImage.cols)) * GX[I+1][J+1]); } } /*-------Y GRADIENT APPROXIMATION-------*/ for(I=1; I<=1; I++) { for(J=1; J<=1; J++) { sumY = sumY + (int)( (*(originalImage.data + X + I + (Y + J)*originalImage.cols)) * GY[I+1][J+1]); } } /*---GRADIENT MAGNITUDE APPROXIMATION (Myler p.218)----*/ SUM = abs(sumX) + abs(sumY); if(SUM>255) SUM=255; if(SUM<0) SUM=0; *(edgeImage.data + X + Y*originalImage.cols) = 255 (unsigned char)(SUM); } }
[0057] Generally, then, the Sobel operator changes a pixel's value to the value of the mask output. Then it shifts one pixel to the right, calculates again, and continues to the right until it reaches the end of a row. The Sobel operator then starts at the beginning of the next row. As shown in
[0058] Another implementation of the Sobel operator uses the following kernel for noise reduction:
The kernal window is moved over the image with no scale or shift in delta. This kernal, for example, can be employed with the following variables submitted to the Sobel operator:
Sobel(in=inputImage,out=outputImage,GrayScale,x.sub.order=1 and y.sub.order=0 KernelSize=3,scale=1,delta shift=0,DrawSolidBorderOnEdge=IntensitySuroundingWindowPixelsMax)
wherein:
TABLE-US-00003 Rectangle rects[ ] //- Rectangle Array Image inputImage //- Pumped in Video Frame Image outputImage //- Output Image after standard operations Image outputImage2 //- Output Image after optional operations.
Kernel selection and size can be adjusted for different foreground object types, such as checks, receipts, business cards, etc. The inventors, however, determined the disclosed particular order of steps and kernel selection to be particularly effective.
[0059] As shown in
[0060]
Pseudocode of the Otsu thresholding is shown below:
TABLE-US-00004 // Calculate histogram int ptr = 0; while (ptr < srcData.length) { int h = 0xFF & srcData[ptr]; histData[h] ++; ptr ++; } // Total number of pixels int total = srcData.length; float sum = 0; for (int t=0 ; t<256 ; t++) sum += t * histData[t]; float sumB = 0; int wB = 0; int wF = 0; float varMax = 0; threshold = 0; for (int t=0 ; t<256 ; t++) { wB += histData[t]; // Weight Background if (wB == 0) continue; wF = total wB; // Weight Foreground if (wF == 0) break; sumB += (float) (t * histData[t]); float mB = sumB / wB; // Mean Background float mF = (sum sumB) / wF;// Mean Foreground // Calculate Between Class Variance float varBetween = (float)wB * (float)wF * (mB mF) * (mB mF); // Check if new maximum found if (varBetween > varMax) { varMax = varBetween; threshold = t; } }
[0061] The range of the histogram is 1 to 255 in grayscale intensity. Variables may be sent to the Otsu operator to set the histogram range:
Otsu_Threshold(in=outputImage,out=outputImage,Histogram_From=1 Histogram_To=255,BlackForegroundWhiteBackground).
[0062] Thresholding may also additionally or alternatively include an adaptive thresholding 28 for strong edge segmentation. Adaptive thresholding using a small block size can result in erosion and highlighting of only the strongest edges. Adaptive thresholding beneficially can dynamically remove noise for the nearsighted camera operation. Adding the second (or additional) thresholding process segments the imagesseparating weak edges from strong edges.
[0063] For example, the destination pixel (dst) is calculated as the mask window is passed over the image:
where T(x,y) is a threshold calculated individually for each pixel.
The threshold value T(x,y) is a mean of the blockSizeblockSize neighborhood of (x,y) minus C.
With a small neighborhood, adaptive thresholding functions like adaptive edge detectionhighlighting only the strongest edges.
[0064] Generally, the adaptive thresholding 28 divides the image into a number of equal blocks. It calculates the threshold value inside each of the blocks. Then the mean value of all the blocks is calculated. Mean values below a threshold result in removal of blocks (left hand side of
wherein Ti is the threshold value of each block, is the mean of all blocks, n is the number of blocks.
[0065] Thus, as the block window is passed over the image, pixels are filled with black or removed with a fill of white depending on the concentrations in the block of primary black or white. The adaptive thresholding then can be a form of thinning operation leaving only the strongest edges which generally should be foreground objectssuch as characters 14 on the foreground object 24.
[0066] In one implementation, adaptive thresholding (or erosion) 28 is by way of a 77 pixel kernel. The thresholding uses the mean of the kernel pixels to determine black or white for the kernel window moving over the image after global segmentation by the Otsu operation. Thus, squares of 77 pixels are forced into black or white, such as is shown in the following variable selection for an adaptive threshold application:
TABLE-US-00005 BlockSize = 7 int Thresh_Kernel[BlockSize][ BlockSize] AdaptiveThresholdErosion(in = outputImage, out = outputImage2, Histogram_From = 1 Histogram_To = 255, Kernel = Thresh_Kernel, BlackBackgroundWhiteForeground_Inverse).
Generally, then, this thresholding operation completes washing out of the background to generate a nearsighted or myopic image.
[0067] Another thresholding operation may make a second, third or otherwise additional (or only) pass over the image. This operation may be optional based on the mean light level in the histogram. Additional thresholding can be skipped if the image is light already based on the mean light level in the histogram. This is demonstrated by pseudocode below:
[0068] BOOL TreatWithSecondPassErosionImage
The mean and standard deviation of the grayscale image are determined:
TABLE-US-00006 var Mean var Stddev get_meanStdDev(in= inputImage, out = Mean, out = Stddev)
The low extreme of the mean is set to determine whether to employ additional thresholding:
TABLE-US-00007 if( cvMean.val[0] < 120 && cvStddev.val[0] > 40 ) // Dark { TreatWithSecondPassErosionImage = TRUE } else if( cvMean.val[0] >= 120 && cvMean.val[0] < 200 && cvStddev.val[0] < 40 ) // Medium { TreatWithSecondPassErosionImage = TRUE } else if( cvMean.val[0] >= 200 && cvStddev.val[0] < 40 ) // Light { TreatWithSecondPassErosionImage = FALSE } else // Anything else { TreatWithSecondPassErosionImage = TRUE } // Use one or the other of the images if(TreatWithSecondPassErosionImage == TRUE) { outputImage = outputImage2 }
[0069] In any case, the resulting myopic image is then ready for the next phase of OCR processes and/or can be used to facilitate adjustment of the relative positioning of the object and mobile electronic device 22. Generally, computer vision algorithms are applied to the resulting image for improved accuracy in object size detection. The method may for example include morphological closing 30, contour tracing 32 and bounding 34 of the objects or characters 14, as shown in
[0070] The morphological closing 30 process uses a structural element to repair gaps in characters, as shown in
[0071] An exemplary structuring element is a 203 line segment and used to repair a cursive j character, as shown in
[0072] The contour tracing 32 process gathers objects and sizes. These objects and sizes are used to determine the average text object size on the foreground document 24. The contour tracing 32 process includes detection of edges that yield contours of the underlying object. Generally, the objects with contours will be closed objects. The matrix of a particular image includes trees or lists of elements that are sequences. Every entry into the sequence encodes information about the location of the next point of the object or character.
[0073]
[0074] An exemplary process for contour tracing 32 includes using the Suzuki and Abe algorithm. Generally, the algorithm determines topographical information about contours of objects using hierarchical border following.
[0075] Contour tracing 32 also can include a shape approximation process. Assuming that most contour points form polygonal curves with multiple vertices, the shape can be approximated with a less complex polygon. The shape approximation process may include, for example, the Ramer-Douglas-Peucker (RDP) algorithm. The RDP algorithm finds similar curves with fewer points with a dissimilarity less than or equal to a specific approximation accuracy. The shape approximation process facilitates bounding 34 by reducing the contours of the characters to simple polygon closed shapes.
[0076] In one implementation, the following variables are submitted to the Suzuki and Abe application:
TABLE-US-00008 Objects objects[ ] //- array of objects Objects objects2[ ] //- array of objects meeting filtered size and component FindObjects( in = outputImage, out = objects, FindOutsideOnlyContour)
Notably, this submission is only concerned with the outside shape of the objects to allow them to be bound within another shape, such as a box which represents the minimum and maximum x and y pixel coordinates of the object.
[0077] The bounding 34 process places a peripheral boundary around each character and around each row of characters 14. For example, a bounding row box or rectangle 34 can be placed around each character (as shown in
[0078] The bounding 34 process calculates and returns the minimal up-right bounding rectangle 34 for the specified point in an approximated contour for an object or character. The contour of the object is used to approximate a row of text objects. The height of the rows are then averaged to get an average character font height for the document. In exemplary pseudocode, the process submits variables for averaging the height and returning an average object size height:
TABLE-US-00009 long heightSum = 0 double fontScale = 0 for(int i=0; i < rects.size( ); i++) { heightSum += rects[i].height; } if(rects.size( ) > 1 ) { fontScale = heightSum / rects.size( ) }.
[0079] Optionally, the bounding 34 process may include a filter that excludes objects of certain size parameters. For example, polygon objects with fewer than 2 or 3 components may be excluded. A more complex filter of objects outside a 2 to 19 font size is shown by the following pseudocode:
TABLE-US-00010 for(int i = 0; i < objects2.size( ); i++ ) { // When we move the camera far away, // the bounding rectangle can become 2 lines combined // filter these out if ( (objects2[i].Rect.width / 1.5 ) > objects2[i].Rect.height) { // Keep objects that are 2 pixels to 19 pixels in size if(objects2[i].Rect.height > 1 && objects2[i].Rect.height < 20 ) { rects.add(objects2[i].Rect); } } }
wherein the filter blocks arrays of rectangles around objects wherein a width of the array is not at least 50% larger than the height. Also, the filter may exclude objects (characters) that have a size less than 2 pixels and greater than 19 pixels. Although other filter parameters are possible, the inventors have found that these parameters work well for images of financial documents such as receipts.
[0080] In another aspect of the present invention, as shown in
[0081]
[0082] The slider bar 44 shows a range of relative positioning of thewithin the center barthat the slider may fall and still be within the preferred focal length of the camera. At a frame rate of 20 or 30 frames per second, the slider would readjust based on the current relative positioning. Moving too far out or in would cause the slider to move down or up outside the center bar and/or the center bar to flash a red color. When within the preferred range, the slider bar and center bar may turn green to signal that the image is ready for capturing and further processing.
[0083] The process of measuring the size of objects such as text fonts in real-time using a mobile electronic device (such as a video camera on a smart phone, tablet or some other moveable electronic or computing device with access to processing power) allows for a wide range of applications. Captured images have improved sizing and resolution for later comparisons in applications such as OCR or virtual reality marker detection. The advantages of this process are not limited to OCR. Any comparison based computer vision application will benefit when a known size object is presented before processing. The approach being presented here operates in real-time at 2030 fps on a mobile device allowing for user feedback to get the optimal focal length and object size during image capture. This process is set apart from any other attempts by an accuracy of 1 inch or 25.4 mm while detecting nearsighted objects on a document or foreground.
[0084] Referring now to
[0085] In one embodiment, the processor is in communication with or includes memory 220, such as volatile and/or non-volatile memory that stores content, data or the like. For example, the memory 220 may store content transmitted from, and/or received by, the entity. Also for example, the memory 220 may store software applications, instructions or the like for the processor to perform steps associated with operation of the entity in accordance with embodiments of the present invention. In particular, the memory 220 may store software applications, instructions or the like for the processor to perform the operations described above with regard to
[0086] In addition to the memory 220, the processor 210 can also be connected to at least one interface or other means for displaying, transmitting and/or receiving data, content or the like. In this regard, the interface(s) can include at least one communication interface 230 or other means for transmitting and/or receiving data, content or the like, as well as at least one user interface that can include a display 240 and/or a user input interface 250. The user input interface, in turn, can comprise any of a number of devices allowing the entity to receive data such as a keypad, a touch display, a joystick, a camera or other input device.
[0087] Reference is now made to
[0088] The handheld electronic device 22 includes various means for performing one or more functions in accordance with embodiments of the present invention, including those more particularly shown and described herein. It should be understood, however, that the mobile station may include alternative means for performing one or more like functions, without departing from the spirit and scope of the present invention. More particularly, for example, as shown in
[0089] As one of ordinary skill in the art would recognize, the signals provided to and received from the transmitter 304 and receiver 306, respectively, may include signaling information in accordance with the air interface standard of the applicable cellular system and also user speech and/or user generated data. In this regard, the mobile station can be capable of operating with one or more air interface standards, communication protocols, modulation types, and access types. More particularly, the mobile station can be capable of operating in accordance with any of a number of second-generation (2G), 2.5G, 3G, 4G, 4G LTE communication protocols or the like. Further, for example, the mobile station can be capable of operating in accordance with any of a number of different wireless networking techniques, including Bluetooth, IEEE 802.11 WLAN (or Wi-Fi), IEEE 802.16 WiMAX, ultra wideband (UWB), and the like
[0090] It is understood that the processor 308, controller or other computing device, may include the circuitry required for implementing the video, audio, and logic functions of the mobile station and may be capable of executing application programs for implementing the functionality discussed herein. For example, the processor may be comprised of various means including a digital signal processor device, a microprocessor device, and various analog to digital converters, digital to analog converters, and other support circuits. The control and signal processing functions of the mobile device are allocated between these devices according to their respective capabilities. The processor 308 thus also includes the functionality to convolutionally encode and interleave message and data prior to modulation and transmission. Further, the processor 308 may include the functionality to operate one or more software applications, which may be stored in memory. For example, the controller may be capable of operating a connectivity program, such as a conventional Web browser. The connectivity program may then allow the mobile station to transmit and receive Web content, such as according to HTTP and/or the Wireless Application Protocol (WAP), for example.
[0091] The mobile station may also comprise means such as a user interface including, for example, a conventional earphone or speaker 310, a ringer 312, a microphone 314, a display 316, all of which are coupled to the processor 308. The user input interface, which allows the mobile device to receive data, can comprise any of a number of devices allowing the mobile device to receive data, such as a keypad 318, a touch display (not shown), a microphone 314, or other input device. In embodiments including a keypad, the keypad can include the conventional numeric (0-9) and related keys (#, *), and other keys used for operating the mobile station and may include a full set of alphanumeric keys or set of keys that may be activated to provide a full set of alphanumeric keys. Although not shown, the mobile station may include a battery, such as a vibrating battery pack, for powering the various circuits that are required to operate the mobile station, as well as optionally providing mechanical vibration as a detectable output.
[0092] The mobile station can also include means, such as memory including, for example, a subscriber identity module (SIM) 320, a removable user identity module (R-UIM) (not shown), or the like, which may store information elements related to a mobile subscriber. In addition to the SIM, the mobile device can include other memory. In this regard, the mobile station can include volatile memory 322, as well as other non-volatile memory 324, which can be embedded and/or may be removable. For example, the other non-volatile memory may be embedded or removable multimedia memory cards (MMCs), secure digital (SD) memory cards, Memory Sticks, EEPROM, flash memory, hard disk, or the like. The memory can store any of a number of pieces or amount of information and data used by the mobile device to implement the functions of the mobile station. For example, the memory can store an identifier, such as an international mobile equipment identification (IMEI) code, international mobile subscriber identification (IMSI) code, mobile device integrated services digital network (MSISDN) code, or the like, capable of uniquely identifying the mobile device. The memory can also store content. The memory may, for example, store computer program code for an application and other computer programs. For example, in one embodiment of the present invention, the memory may store computer program code for performing the processes associated with
[0093] While the methods and systems have been described in connection with preferred embodiments and specific examples, it is not intended that the scope be limited to the particular embodiments set forth, as the embodiments herein are intended in all respects to be illustrative rather than restrictive.
[0094] Unless otherwise expressly stated, it is in no way intended that any method set forth herein be construed as requiring that its steps be performed in a specific order. Accordingly, where a method claim does not actually recite an order to be followed by its steps or it is not otherwise specifically stated in the claims or descriptions that the steps are to be limited to a specific order, it is no way intended that an order be inferred, in any respect. This holds for any possible non-express basis for interpretation, including: matters of logic with respect to arrangement of steps or operational flow; plain meaning derived from grammatical organization or punctuation; the number or type of embodiments described in the specification.
[0095] Implementations of the present invention provide many advantages. Measurement of the distance of the lens from the paper facilitates capture of a font object size for improved clarity. The improved clarity results in improved OCR recognition rates as compared to freehand capture of the image. Implementations also provide an ability to calculate optimal font size for OCR detection on a live video feed while accounting for optimal focus and clarity. Implementations of the present invention can measure and record optimal focal length and OCR font size ranges on raw video feed. These measurements can be used to guide the camera user through visual cues and indicators to move the camera to the best location in space. This produces a better OCR compatible image for text recognition. The focal ratio determines how much light is picked up by the CCD chip in a given amount of time. The number of pixels in the CCD chip will determine the size of a font text character matrix. More pixels means a bigger font size, regardless of the physical size of the pixels. OCR engines have an expected and optimal size range for character comparison. When fonts are in the optimal range and have clear crisp well defined edges, OCR detection and accuracy is improved. Implementations of the present invention provide guidance to that optimal range.
[0096] It will be apparent to those skilled in the art that various modifications and variations can be made without departing from the scope or spirit. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims.